February 13, 2018
Is Big Data a Sort of Voodoo Economics?
by Bill Wilson
Some predictive models are “Rube Goldberg” constructs; the worst resemble “a bunch of monkeys heading up the Manhattan Project.”
Is charging consumers more for their insurance because they use a Hotmail email account or have a “non-English-sounding” name a valid application of predictive modeling, or does it constitute presumptive modeling and unfair discrimination? Does it matter if “big data” is riddled with bad data and bogus information as long as it improves insurer expense ratios? Is this the insurance industry’s version of voodoo economics?
It’s no secret that I’ve written things about the Holy Crusade known as insurtech that are critical or at least suggest caution in climbing aboard the hype and hyperbole bandwagon. Insurtech has been touted as the philosopher’s stone with its ability to turn “lead” data into golden predictions.
One component of this “movement” is big data, the miracle cure for perceived stagnant industry profits known as data analytics and predictive modeling.
There is nothing new about the importance and value of data and its wiser big brother, information. Predictability, in the aggregate, is the cornerstone of industry stability and profitability. It’s the foundation of actuarial science. But, to be of value, the data must be credible, and the models that use it must be predictive by more than mere correlation. And, to be usable, the data and models must meet legal requirements by being risk-based and nondiscriminatory. That’s where one of my concerns lies. Just how valid and relevant is the data, and how is it being used?
What prompted this article was a blurb in Shefi Ben-Hutta’s Coverager newsletter [emphasis added]:
“Certain domain names are associated with more accidents than others. We use a variety of pieces of information to accurately produce a competitive price for our customers.” – Admiral Group in response to research by The Sun that found the insurer could charge users…extra on their car insurance, simply for using a Hotmail email account instead of a Gmail one.”
This revelation came just days after The Sun ran an article accusing the U.K. insurer of charging drivers with non-English-sounding names as much as £900 extra for their insurance. I don’t know enough about insurance in the U.K. to opine about the potential discriminatory nature of jacking premiums on people whose names don’t sound “English,” but my guess is that U.S. state insurance departments likely would not look favorably on this as a rating factor.
See also: Strategies to Master Massively Big Data
Historically in the U.S., P&C insurance rates have been largely based on factors that are easily ascertained and confirmed. For example, the “COPE” acronym (construction, occupancy, protection and exposure) incorporates most of the factors used in determining a property insurance rate. From the standpoint of the fire peril, frame construction is riskier than fire-resistive construction. A woodworker is riskier than an office. Having a fire hydrant 2,000 feet away from a building is riskier than one 200 feet away. It makes sense. It’s understandable. It’s provable.
The risk inherent in these factors is demonstrable. The factors are understandable by consumers and business owners. It’s easy to identify what insureds can do to improve their risk profile and reduce their premiums. Advice can be given on how to construct a building, install protective systems, etc. to reduce risk and insurance costs. Traditional actuarial models are proven commodities, and state insurance regulators have the expertise and ability to evaluate the efficacy of rate changes.
What these factors are not, in many cases, is inexpensive. Confirming this information may require a physical inspection. Some state laws require or compel such inspections. In my state, our valued policy law says that buildings must be inspected within 60 days of policy inception or the law is triggered and a carrier may have to pay policy face value for a total fire loss. Are the insurtech startups selling homeowners insurance even aware of this? It is understandable that insurers want to reduce any unnecessary underwriting expenses if there are acceptable alternatives. Doing so may improve profitability or make them more competitive by enabling premium reductions.
This is where insurtech and technology in general can play a valuable role. Using reliable data on construction and size of buildings, building code inspection reports, satellite mapping for hydrant location and so forth can have an almost immediate impact on the carrier expense side and potentially the loss component. To a large extent, this is actually being done, but the search for something more (or less, if we’re talking about expenses) continues.
Enter “big data” and predictive modeling, along with a horde of people who know absolutely nothing about the insurance industry but a lot about deluding gullible people with hip press releases. They tout the salvation of phone apps, AI bots and “black box” rating algorithms with 600 variables and factors. Factors such as whether someone, according to their Facebook page or other online source, bowls in a Wednesday night mixed league where (speaking from personal experience) the focus is more on beer consumption than bowling and how that might affect the risk of an auto accident.
The $64,000 question is how reliable are these predictive model algorithms and how credible is the data they use? The author of an article titled “How Trustworthy Is Big Data?” claims that there is typically a lot less control and governance built into big data systems compared with traditional architectures:
“Most organizations base their business decision-making on some form of data warehouse that contains carefully curated data. But big data systems typically involve raw, unprocessed data in some form of a data lake, where different data reduction, scrubbing and processing techniques are then applied as needed.”
In other words, there may be little up-front vetting of the information because that takes time and costs money and, when acquired, there is no certainty that the data will ever be used. So, the approach may be to vet the data only when used, and, as the article suggests, that can be problematic.
The article also addresses the ethics of acquiring information on individuals for what may be perceived as nefarious reasons (e.g., price optimization):
“Just because something is now feasible doesn’t mean that it’s a good idea. If your customers would find it creepy to discover just how much you know about their activities, it’s probably a good indication that you shouldn’t be doing it.”
Going back to The Sun’s Admiral reports, what impression would it make on Admiral’s customers if the insurer advertised, “Pay less if you have an English-sounding name!” Would any insurer advertise something they’re allegedly doing behind closed doors? It’s like the ethical decision criteria of, what would your mother think if she knew what you were about to do? The right to do something doesn’t mean that doing it is right. Does black-box algorithmic rating enable and potentially protect this practice?
I mentioned at the outset of this article that the Admiral report prompted the article. What compelled the article was a recent personal experience when I received a $592 auto insurance invoice a little more than two months into my policy. The invoice attachments never really said why the carrier wanted additional premium, but a quick review indicated the reason.
Our son moved out of the house three years ago, and we removed him from our insurance program, including his vehicle. He still uses the same agency (different insurer) that I’ve used since 1973 to insure his auto, condo and personal umbrella. Our insurer learned that his vehicle registration notice is still mailed to our address. With that information, they (i.e., their underwriting model) unilaterally concluded that he still must live here, so they added him back to our insurance program and made him the primary driver of one of our three autos (the most expensive one, of course). I’m not sure what they thought happened to his vehicle. But, of course, no one “thought” about anything. An algorithmic decision tree spit out a boiler-plated invoice.
I’ve been with this carrier now for four years, loss-free, and paid them somewhere in the neighborhood of $20,000 in premiums, yet they could not invest 10 minutes of a clerical person’s time to make a phone call and confirm my son’s residency. Neither we nor our agent received any notice or inquiry prior to the invoice, but my agency CSR (who, I’m happy to report, is still an empathetic human) was able to quickly fix the problem.
I have written about my personal experiences with a prior insurer involving credit scores. My homeowners premium was increased by $1,000 and, by law, I was advised that it was due to credit scoring. As it turned out, the credit reports of a Wilson couple in Colorado were used. Two years later, my homeowners premium was bumped $700 based on three “reason codes,” which I was able to prove were bogus, and the carrier rescinded the invoice. Now I’m being told that my current insurer’s information source tells them that my son has moved back home. I realize that these tales are anecdotal, but three instances in five years? How pervasive is this misinformation?
Is this what “big data” brings to the table? Big, BAD data and voodoo presumptive (not predictive) modeling? Who really benefits from this? Anyone? One of the insurtech buzz words going around is “transparency.” What’s transparent about “black box” underwriting and rating?
At a convention last year, I spoke at length to a data scientist who was formerly with IBM and is now an insurance industry consultant. Without naming names, he characterized some of the predictive models he has examined as “Rube Goldberg” constructs, with the worst ones resembling “a bunch of monkeys heading up the Manhattan Project.”
See also: Big Data? How About Quality Data?
Another consultant expressed his concern about some data companies. An NAIC presentation he attended listed some parameters relative to data points being used by carriers. The presenter expressed confidence that carriers were disclosing all of their data points. He is convinced, however, that carriers are using 25% to 50% more data points than the NAIC seems to be aware of. He has written about the abuse of data that lacks an actuarial grounding in risk assessment, again, a requirement of some state laws.
Among the many problems with “black box” rating is the fact that no one may be able to explain how a particular premium was derived. No one may be able to tell someone how to reduce their premium. Perhaps most important, regulators may be unable to determine if the methodology results in rates that are unfairly discriminatory or otherwise violate state laws that require that rates be risk-based. Presumably, future rate filings will simply be a giant electronic file stamped “Trust Me.”
“Big data” might be beneficial to insurers from a cost, profitability and competitive standpoint, but it’s not clear how or even if it will affect consumers in a positive way. All the benefits being touted by the data vendors and consultants accrue to insurers, not their customers. In at least one case, if you have a “non-English-sounding” name, the impact is adverse. The counter argument from the apostles of big data is that the majority of people will benefit. Of course, that was arguably the logic used when schools were segregated, but that doesn’t justify the practice.
In the book “Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech,” the author points to an investigation of a correctional facility system that used proprietary algorithms to help decide bail terms, prison sentences and parole eligibility using various factors, some alleged to be discriminatory (e.g., arrest records of neighbors where the person lived). A Wall Street Journal article, “Google Has Chosen an Answer For You – It’s Often Wrong,” demonstrated how searches often indicated a bias or manipulation by whomever constructed the algorithms being used or by how the search parameters were entered by users. Errors in building replacement cost valuations are often blamed on incompetent or untrained data harvesters and users….Even when the data is presumed to be accurate, it can be used incorrectly.
In 2016, I wrote an article for Independent Agent magazine titled, “The Six Worst Things to Happen to Insurance in the Past 50 Years.” No. 3 on my list was the growing obsession with data vs. people. When I write about these things, I know I run the risk of being characterized as the old man on his front porch yelling at the “disrupter” kids to get out of his yard, but I don’t think I’m a Luddite. I love and embrace technology.
I had a PC before IBM did. I still have the first iPod model. My phone is full of nifty apps. My son is a data scientist in the healthcare industry. I get it. But technology is a tool, not a religion. Far too many people treat technological innovation as sacrosanct and infallible, and anyone who questions or challenges its legitimacy and righteousness is committing heresy.
What’s next, a SnapChat invitation from an AI bot that says, “Welcome to the Insurance Matrix, Mr. Anderson”? Not yet, I hope.