I'll be quick this week because I'm headed to the airport to fly to InsureTech Connect in Las Vegas (where I hope to see many of you). But I wanted to share a major report on AI that, to my mind, should erase any sense of complacency about the need to quickly figure out the possibilities of AI for your organization.
The report says researchers found that generative AI is already better than humans at half of a host of real-world business tasks assigned to it, including many related to insurance.
As baseball legend Satchel Paige once said, "Don't look back. Something may be gaining on you."
The recent MIT study that found that 95% of AI projects haven't delivered a return on investment, as well as a report about the "workslop" supposedly produced by gen AI, have led to a sense that the bloom may be off the rose. I think both used flawed methodologies. In any case, they are both thoroughly rebutted by a recent research paper from OpenAI.
A thorough article in Fortune says of the paper:
"Many AI benchmarks do not reflect real world use cases. Which is why a new gauge published by OpenAI... is so important. Called GDPval, the benchmark evaluates leading AI models on real-world tasks, curated by experts from across 44 different professions, representing nine different sectors of the economy. The experts had an average of 14 years experience in their fields, which ranged from law and finance to retail and manufacturing, as well as government and healthcare.
"Whereas a traditional AI benchmark might test a model’s capability to answer a multiple choice bar exam question about contract law, for example, the GDPval assessment asks the AI model to craft an entire 3,500 word legal memo assessing the standard of review under Delaware law that a public company founder and CEO, with majority control, would face if he wanted this public company to acquire a private company that he also owned."
Results varied based on task and on which AI model was being used but often were startlingly better. As the Fortune article says, while researchers have talked about artificial general intelligence (AGI) as the Holy Grail, it may be better to think in terms of AJI (artificial jagged intelligence)--in other words, for some tasks, the AI is incredible; for others, not so much.
Plenty of caveats still apply. I always wonder about the rigor of a report produced by a vendor (even though this seems plenty sound). I also continue to believe that the relevant test isn't humans vs. AI. I believe that AI will take over tasks, not full jobs, so the real test needs to be humans using AI vs. whatever the process is now--a la the "centaur" teams of humans and AI that compete against other teams of humans and AI in chess.
But I still think the OpenAI research paper is important and commend it to your attention.
Cheers,
Paul
P.S. If you're looking for arguments in favor of AI use in your organization, you might also check out a recent report from Air Street Press. It says that capability per dollar spent by a user on AI "is doubling every few months. Google’s rate: 3.4 months. OpenAI’s: 5.8 months." The report also cites a study that found that "44% of U.S. businesses now pay for AI, up from 5% in 2023." There are loads of other interesting tidbits in there, too.
