As Algorithms Advance, Poor Data Limits Gains

Without end-to-end data control, ROI gets eaten up by misleading outputs and cleanup costs—fueling the old guard's skepticism about AI.

An artist’s illustration of artificial intelligence

Many players in the insurance industry are gaining only a fraction of the returns they expected from their AI investments. That shortfall often fuels skepticism from legacy leadership who were wary of the technology to begin with. But too often, the problem isn't the algorithms themselves. It's the data. 

Even the most advanced models are useless without high-quality, reliable input. And the most effective way to ensure that quality is to mine the data yourself and maintain end-to-end control over how it's collected, filtered, and applied.

One Bad Data Point Can Poison the Well

One of the greatest fears organizations face when adopting AI is the risk of a single piece of inaccurate data slipping into the system and poisoning the well. In a 2024 McKinsey survey, 63% of respondents cited output inaccuracy as the top risk in their use of generative AI, up seven percentage points over the previous year. 

Yet despite this growing concern, many companies remain entranced by the glow of the term "AI" itself—rushing to implement tools without laying the groundwork for data integrity. Some fail to build the infrastructure to collect their own data; others skip the due diligence needed to properly vet third-party providers. The result is a brittle foundation: sophisticated models running on shaky inputs, with consequences that can quietly accumulate until something breaks.

In insurance, one of the most promising frontiers for AI is analyzing top producer performance—not just tracking who closes the most policies but understanding the subtle behaviors that lead to conversion: how often the producer follows up, what order they present options in, when they reach out. But if that behavioral data is sourced from generic CRMs or patchy third-party logs—where calls are logged inconsistently, meetings lack context, and outcomes aren't clearly tied to actions—then the AI will draw the wrong conclusions. 

Companies may end up reinforcing behaviors that correlate with success but don't actually cause it. That's no better than relying on gut instincts and locker-room advice from the old guard, except now there's money being sunk into a sophisticated model that's simply institutionalizing mediocrity. Worse still, if flawed data leads to the enshrinement of the wrong patterns, organizations could find themselves scaling exactly what holds their teams back.

The Myth of Clean-Up Later

Many argue that building end-to-end data collection from day one is too disruptive or expensive. The more convenient approach, they say, is to get the system up and running first, then "clean" the data later. But this logic backfires. 

By the time messy data filters into an AI model, it's already riddled with gaps, duplicates, and subtle inconsistencies that no amount of cleaning can fully resolve. You end up hiring teams of analysts just to guess at what really happened: Was that "client meeting" a strategic pitch or a casual coffee? Did an agent log a follow-up call because it occurred, or because it was expected? This kind of retroactive detective work burns time, erodes confidence, and costs far more in the long run than simply investing up front in clean, self-sourced data pipelines.

Why Risk Assessment Isn't as Real-Time as It Should Be

Even in underwriting, arguably the most mature use case for AI in insurance, poor data collection quietly eats away at ROI. Many carriers have invested in models built to price risk with surgical precision, drawing on inputs like medical records, driving histories, IoT data, and lifestyle factors. But when those inputs are delayed, incomplete, or sourced from unvetted third parties, the model is left to make educated guesses. 

A single missing lab result, a misclassified occupation, or an outdated property inspection can tilt risk scores off course and trigger systemic mispricing. Worse, in trying to compensate for these blind spots, underwriters often revert to manual reviews or blanket restrictions, undoing the very efficiency and scalability AI was supposed to unlock.

Tighter Rules, Higher Stakes

Maintaining end-to-end control over data collection and processing is no longer just a best practice. It's a way to stay ahead of compliance, especially as regulations tighten. In recent years, the U.S. has begun increasing oversight of AI and data protection, driven by mounting concerns over privacy and misuse. 

At the federal level, the proposed American Privacy Rights Act (APRA) of 2024 aims to establish comprehensive consumer data rights and enforce stricter standards for how personal information is collected and managed. States are moving in parallel. Tennessee's ELVIS Act, passed in March 2024, is the first U.S. law to directly address AI-generated impersonations, while Utah's Artificial Intelligence Policy Act creates penalties for companies that fail to disclose their use of generative AI in consumer interactions. 

For insurers, given that they handle large volumes of sensitive data, these developments underscore the need for robust governance.

Data as Differentiator

Beyond regulatory compliance, proprietary data offers a profound competitive advantage, especially in industries like insurance where nuance and historical context matter. Most companies build their AI models on generic, surface-level information, often scraped from the same third-party databases or public Web sources, what might be considered the "first page of Google" tier of data. But this kind of information is widely accessible and easily replicable, which means it rarely drives unique insight. 

By contrast, companies that mine their own data, tracking granular activity, customer engagement, behavioral signals, and operational workflows, can generate insights that no competitor can duplicate. This differentiation becomes even more powerful when that proprietary data reveals subtle correlations invisible to broader datasets, such as which underwriter behaviors lead to fewer claims disputes or which policyholder interactions predict lifetime customer value. 

In a market increasingly shaped by machine learning models, the organization with deeper, cleaner, and more exclusive data doesn't just win the compliance game, it outthinks the competition.

An Incremental, Holistic Approach

So how do companies begin actually building end-to-end data control? 

At first glance, the question can seem overwhelming, especially for legacy insurers juggling siloed systems, manual workflows, and decades of technical debt. But the key is to start small and build iteratively. Instead of trying to overhaul the entire data architecture in one sweep, leading organizations begin by instrumenting a single high-impact workflow; for example, sales calls or underwriting touchpoints, with lightweight tracking tools. From there, they layer on automation: capturing interactions passively, syncing them across systems, and enriching them with context in real time. 

This phased approach reduces disruption while steadily increasing data visibility. Importantly, companies don't have to do it alone. Many are finding success by working with specialized vendors that embed into their existing infrastructure and quietly automate data capture behind the scenes. 

Over time, these efforts create a virtuous cycle. Cleaner data leads to better AI outputs, which in turn builds trust and momentum for deeper transformation.

Read More