Top 6 Myths About Predictive Modeling

Despite what many think, the most important important issue isn't which model to choose, and the biggest challenge isn't technical.

Bret Shroyer

November 21, 2014

Even if you’ve been hiding under a rock the past 25 years, it’s almost impossible to avoid hearing about how companies are turning around their results through better modeling or how new companies are entering into insurance using the power of predictive analytics. So now you’re ready to embrace what the 21st century has to offer and explore predictive analytics as a mainstream tool in property/casualty insurance. But misconceptions are still commonplace. Here are the top six myths dispelled: Myth: Predictive modeling is mostly a technical challenge. Fact: The predictive model is only one part of the analytics solution. It’s just a tool, and it needs to be managed well to be effective. The No. 1 point of failure in predictive analytics isn’t technical or theoretical (i.e., something wrong with the model) but rather a failure in execution. This realization shifts the burden of risk from the statisticians and model builders to the managers and executives. The carrier may have an organizational readiness problem or a management and measurement problem. The fatal flaw that’s going to derail a predictive analytics project isn’t in the model, but in the implementation plan. Perhaps the most common manifestation of this is when the implementation plan around a predictive model is forced upon a group:

Underwriters are told that they must not renew accounts above a certain score
Actuaries are told that the models are now going to determine the rate plan
Managers are told that the models will define the growth strategy

In each of these cases, the plan is to replace human expertise with model output. This almost never ends well. Instead, the model should be used as a tool to enhance the effectiveness of the underwriter, actuary or manager. Myth: The most important thing is to use the right kind of model. Fact: The choice of model algorithm and the calibration of that model to the available data are almost never the most important things. Instead, the biggest challenge is merely having a credible body of data upon which to build a model. In “The Unreasonable Effectiveness of Data,” Google research directors Halevy, Norvig and Pereira wrote: “Invariably, simple models and a lot of data trump more elaborate models based on less data.” No amount of clever model selection and calibration can overcome the fundamental problem of not having enough data. If you don’t have enough data, you still have some options: You could supplement in-house data with third-party, non-insurance data, append insurance industry aggregates and averages or possibly use a multi-carrier data consortium, as we are doing here at Valen. Myth: It really doesn’t matter which model I use, as long as it’s predictive. Fact: Assuming you have enough data to build a credible model, there is still a lot of importance in choosing the right model -- though maybe not for the reason you’d think. The right model might not be the one that delivers the most predictive power; it also has to be the model that has a high probability of success in application. For example, you might choose a model that has transparency and is intuitive, not a model that relies on complex machine-learning techniques, if the intuitive model is one that underwriters will use to help them make better business decisions. Myth: Predictive modeling only works well for personal lines. Fact: Personal lines were the first areas of success for predictive modeling, owing to the large, homogeneous populations that they serve. But commercial lines aren't immune to the power of predictive modeling. There are successful models producing risk scores for workers' compensation, E&S liability and even directors & officers risks. One of the keys to deploying predictive models to lines with thin policy data is to supplement that data, either with industry-wide statistics or with third-party (not necessarily insurance) data. Myth: Better modeling will give me accurate prices at the policy level. Fact: Until someone invents a time machine, the premiums we charge at inception will always be wrong. For policies that end up being loss-free, we will charge too much. For the policies that end up having losses, we will charge too little. This isn’t a bad thing, however. In fact, this cross-subsidization is the fundamental purpose of insurance and is necessary. Instead of being 100% accurate at the policy level, the objective we should aim for in predictive analytics is to segment the entire portfolio of risks into smaller subdivisions, each of which is accurately priced. See the difference? Now the low-risk policies can cross-subsidize one another (and enjoy a lower rate), and the high-risk policies will also cross-subsidize one another (but at a high rate). In this way, the final premiums charged will be fairer. Myth: Good models will give me the right answers. Fact: Good models will answer very specific questions, but, unless you’re asking the right questions, your model isn’t necessarily going to give you useful answers. Take time during the due diligence phase to figure out what the key questions are. Then when you start selecting or building models, you’ll be more likely to select a model with answers to the most important questions. For example, there are (at least) two very different approaches to loss modeling:

Pure premium (loss) models can tell you which risks have the highest potential for loss. They don’t necessarily tell you why this is true, or whether the risk is profitable.
Loss ratio models can tell you which risks are the most profitable, where your rate plan may be out of alignment with risk or where the potential for loss is highest. However, they may not necessarily be able to differentiate between these scenarios.

Make sure that the model is in perfect alignment with the most important questions, and you'll receive the greatest benefit from predictive analytics.