Is the Data Talking, or Your Biases?

All too often, we think we're using data to develop a product, but all we've done is build a model to validate our biases.

John Johansen

September 23, 2015

In April, a large life insurer announced plans to use Fitbit data and other health data to award points to insureds, providing impressive life insurance discounts for those who participated in "wellness-like" behaviors. The assumption is that people who own a Fitbit and who walk should have lower mortality. That sounds logical. But we're in insurance. In insurance, logic is less valuable than facts proven with data.

Biases can creep into the models we use to launch new products. Everyone comes to modeling with her own set of biases. In some conference room, there is probably something like this on a whiteboard: "If we can attract people who are 10% more active, in general, we will drive down our costs by 30%, allowing us to discount our product by 15%."

That is a product model. But that model was not likely based on tested data. It was likely a biased supposition pretending to be a model. Someone thought he used data, when all he did was to build a model to validate his assumptions.

Whoa.

That statement should make us all pause, because it is a common occurrence - not everything that appears to be valid data is necessarily portraying reality. Any data can be contorted to fit someone's storyline and produce an impostor. The key is to know the difference between data cleansing/preparation and excessive manipulation. We continually have to ask if we are building models to fit a preconceived notion or if we are letting the data drive the business to where it leads us.

Biases hurt results. When I was a kid, my Superman costume didn't make me Superman. It just let me collect some candy from the neighbors. Likewise, if insurers wish to enter into an alternate reality by using biased data, they shouldn't expect results that match their expectations. Rose-colored glasses tend to make the world look rosy.

Here's the exciting part, however. If we are careful with our assumptions, if we wisely use the new tools of predictive analytics and if we can restrain ourselves from jumping through our hypotheses and into the water too soon, objective data and analytics will transport us to new levels of reality! We will become hyper-knowledgeable instead of pseudo-hyper-knowledgeable.

Data, when it is used properly, is the key to new realms, the passport to new markets and to a secure source of future predictive understanding. First, however, we have to make it trustworthy.

Advocating good data stewardship and use.

In general, it should be easy to see when we're placing new products ahead of market testing and analysis. When it comes to insurance, real math knows best. We've spent many decades perfecting actuarial science. We don't want to toss out fact-based decisions now that we have even more complete, accurate data and better tools to analyze the data.

When we don't use or properly understand data, weak assumptions begin to form. As more accurate data accumulates and we are forced to compare that data with our pre-conceived notions, we may be faced with the reality that our assumptions took us down the wrong path. A great example of this was long-term care insurance. Many companies rushed products to market, only later realizing that their pricing assumptions were flawed because of larger-than-expected claims. Some had to exit the business. The companies remaining in LTC made major price increases.

Auto insurers run into the same dangers (and more) with untested assumptions. For example, who receives discounts, and who should receive discounts? Recently, a popular auto insurer that was giving discounts to drivers with installed telematics, announced that it would begin increasing premiums on drivers who seemed to have risky driving habits. The company had assumed that those who chose to use telematics would be good drivers and that just having the telematics would cause them to drive more safely. The resulting data, however, proved that some discounts were unwarranted; just because someone was willing to be monitored didn't mean she was a safe driver.

Now the company is basing pricing on actual data. It has also implemented a new pricing model by testing it in one state before rolling it out broadly - another step in the right direction.

When we either predict outcomes before analyzing the data or we use data improperly, we taint the model we're trying to build. It's easy to do. Biases and assumptions can be subtle, creeping silently into otherwise viable formulas.

Let's say that I'm an auto insurer. Based on an analysis of the universe of auto claims, I decide to give 20% of my U.S. drivers (the ones with the lowest claims) a discount. I'm assuming that our mix of drivers is the same as the mix throughout the universe of drivers. After a year of experience, I find that I am having higher claims than I anticipated. When I apply my claims experience to my portfolio, I find that, actually, only the top 5% are a safe bet for a discount, based on a number of factors. Now I’ve given a discount to 15% more people than ought to have had it. Had I tested the product, I might have found that my top 20% of U.S. drivers were safe drivers but were also driving higher-priced vehicles - those with a generally higher cost per claim. The global experience didn't match my regional reality.

Predictions based on actual historical experience, such as claims, will always give us a better picture than our "logical" forays into pricing and product development. In some ways, letting data drive your organizations decisions is much like the coming surge of autonomous vehicles. There will be a lot of testing, a little letting go (of the driver's wheel) and then a wave of creativity surrounding how the vehicle can be used effectively. The result of letting the real data talk will be the profitability and longevity of superior models and a tidal wave of new uses. Decisions based on reality will be worth the wait.