When AI Doesn't Work

To keep us from getting carried away, it's good to look from time to time at the failures of AI to live up to the projections -- and COVID is a prime example.

Although I'm a big believer in the prospects for artificial intelligence, and we've certainly published a lot to that effect here at Insurance Thought Leadership, AI has also carried a ton of hype since it emerged as a serious field of study in the mid-20th century. I mean, weren't we supposed to be serving our robot overlords starting a decade or two ago?

To keep us from getting carried away, it's good to look from time to time at the failures of AI to live up to the projections, to see what AI doesn't do, at least not yet. And the attempts to apply AI to the diagnosis of COVID-19 provide a neatly defined study.

I've long believed in learning the lessons from failure, not just from successes. To that end, while Jim Collins had done the work on the patterns of success in Good to Great and Built to Last, I published a book (Billion Dollar Lessons, written with Chunka Mui) a decade-plus ago based on a massive research project into the patterns that appeared in 2,500 major corporate writeoffs and bankruptcies. You can't just look at the handful of people who, say, won millions of dollars at roulette and declare that betting everything on red is a good strategy; you have to look at the people who lost big at roulette, too, to get the full picture.

In the case of AI, a recent article from the MIT Technology Review found that, to try to help hospitals spot or triage COVID faster, "many hundreds of predictive tools were developed. None of them made a real difference, and some were potentially harmful.... None of them were fit for clinical use [out of 232 algorithms evaluated in one study]. Just two have been singled out as being promising enough for future testing."

Another study cited in the article "looked at 415 published tools and... concluded that none were fit for clinical use."

What went wrong? The biggest problem related to the data, which contained hidden problems and biases.

The article said: "Many [AIs} unwittingly used a data set that contained chest scans of children who did not have COVID as their examples of what non-COVID cases looked like. But as a result, the AIs learned to identify kids, not COVID."

One prominent model used "a data set that contained a mix of scans taken when patients were lying down and standing up. Because patients scanned while lying down were more likely to be seriously ill, the AI learned wrongly to predict serious COVID risk from a person’s position.

"In yet other cases, some AIs were found to be picking up on the text font that certain hospitals used to label the scans. As a result, fonts from hospitals with more serious caseloads became predictors of COVID risk."

Some tools also ended up being tested on the same data they were trained on, making them appear more accurate than they are.

Other problems included what's known as "incorporation bias" -- diagnoses or labels provided for the data before it was fed to the AI were treated as truth and "incorporated" into the AI's analysis even though those diagnoses and other labels were subjective.

I'll add based on personal observation from 35 years of tracking AI that it's tricky to manage, meaning that issues should be expected. The vast majority of senior executives don't have a technical background in information technology, let alone in AI, so it's hard for them to evaluate which AI projects will pan out and which should be set aside. Even those proposing the projects can't know with much precision ahead of time. They can identify areas as promising, but nobody can know that they'll hit an insight until that insight appears. Add the fact that AI carries an air of magic, which can give it the benefit of the doubt even when good, old humans might do a better job.

The article's main general recommendation happens to be the same prescription that Chunka and I offered at the end of Billion Dollar Lessons to help head off future disasters: generate some pushback.

In our case, dealing with corporate strategy, we recommended finding a "devil's advocate" who would look for all the reasons a strategy might fail. The person would then present them to the CEO, who otherwise is often fed a diet of affirmation by people trying hard to make the CEO's brainchild look brilliant. Our research found that 46% of corporate disasters could have been averted because the strategies were obviously flawed.

In the case of AI, experts quoted in the MIT Technology Review article recommend finding people who could look for problems in the data and for other biases. That advice should be extended to considerations of whether a project should be attempted in the first place and whether claims made on behalf of an AI should be tempered.

As I said, I firmly believe that AI will play a major role in transforming the insurance industry. There are already scores of examples of successful implementations. I just think we'll all be better off if we keep our eyes wide open and anticipate problems -- because AI is tricky stuff, and problems are out there. The more pitfalls we can avoid, the greater our likelihood of success.



Paul Carroll

Profile picture for user PaulCarroll

Paul Carroll

Paul Carroll is the editor-in-chief of Insurance Thought Leadership.

He is also co-author of A Brief History of a Perfect Future: Inventing the Future We Can Proudly Leave Our Kids by 2050 and Billion Dollar Lessons: What You Can Learn From the Most Inexcusable Business Failures of the Last 25 Years and the author of a best-seller on IBM, published in 1993.

Carroll spent 17 years at the Wall Street Journal as an editor and reporter; he was nominated twice for the Pulitzer Prize. He later was a finalist for a National Magazine Award.


Read More