As volumes of unstructured data explode across legacy platforms, cloud apps, and shadow IT, many insurance carriers carry data risk that can undermine their mission.
Siloed platforms and manual recordkeeping allow information to accumulate unchecked across organizations' data estates. Obsolete files collected over decades mingle with sensitive, business-critical data, making it difficult for an insurer to understand what data they possess, where it's located, who should have access, and how long it should be kept.
The growing risk of unstructured data sprawl
The risk is real — a disjointed, chaotic data estate makes regulatory penalties more likely and increases cybersecurity vulnerabilities.
In turn, lawmakers and regulators are pushing for insurers to responsibly manage their data.
In the U.S., 28 jurisdictions have adopted the National Association of Insurance Commissioners (NAIC) Insurance Data Security Model Law (MDL-668); New York separately enforces 23 NYCRR 500, which sets rigorous cybersecurity requirements for insurers. Meanwhile, privacy laws like California’s CCPA/CPRA or the EU's GDPR compels firms to disclose how long they retain each category of personal information and to delete it when no longer needed. All three regulations push companies to govern the data they keep and defensibly dispose of what they don't need.
If the threat of regulatory penalties won't compel companies to manage their data, the threat of a costly data breach might: The average cost of a data breach in financial services reached $5.6 million in 2025, according to IBM. Minimizing and governing personal data directly reduces exposure.
But the difficulty is high: With 90% of organizational data being unstructured, firms have a hard time understanding and correctly classifying it. Unlike data that has a pre-defined structure or data model, unstructured data (documents, emails, media files, for example) is inherently free-form, with no pre-defined shape or format. Until you open that Word document, the only clue you'll have as to its contents will be its metadata, which includes high-level details like name and file size. As a result, teams spend a lot of time searching for the content they need or replicating it unnecessarily.
Analyst firm IDC saw 22% of IT decision-makers polled say unstructured data is unnecessarily replicated, and just 58% of unstructured data is ever reused after initial use/creation.
With AI, governance becomes an enabler
Data governance has traditionally been an afterthought, necessary purely for the sake of reducing risk. Organizations may only consider data governance in the wake of a data breach or a failed audit. But with the advent of AI, it has a new selling point: innovation and growth.
For organizations adopting AI (which is to say, most of them), they need to focus on their data, because to get the best results with AI, it needs high-quality, trusted, compliant data. Organizations that prioritize strong data governance can provide their AI platforms with data they're confident is authentic and reliable, free of bias and error, and respects individuals' privacy.
This is particularly important in the financial services industry, which is heavily regulated and deals with both personally identifiable information (PII) and payment card information (PCI). An insurer looking to adopt AI, even for a limited use-case, needs to be able to trust its data, and demonstrate compliance with privacy and financial regulations. The NIST AI Risk Management Framework (AI RMF) offers guardrails.
A modernization blueprint: establishing enterprise "data trust"
So, there you have it, two arguments for the centrality of data governance to the financial services industry: the stick (reduce risk) and the carrot (gain AI innovation). For insurers nervous about their compliance or looking to grow with AI, what does this look like in practice?
1. Inventory and classify at scale
Good data governance starts with developing an understanding of your data, both structured and unstructured, across file shares and SaaS platforms, so you can trust it and ensure its provenance. This isn't a point-in-time review — you need to do this continuously, at scale. Tools can help with this challenge, minimizing the risk of human error.
2. Codify retention and legal holds
Once you know your data, and you understand the regulations you are subject to, you can apply relevant retention schedules and implement legal holds when required.
3. Review data access and sharing
A study by Concentric found 15% of business-critical resources were at risk of oversharing. You need to audit your teams' access to data and implement a zero-trust, least-privilege approach that grants users access only to the tools and data they need to perform their role, and no more.
4. Minimize data and remove the ROT
Privacy regulations like GDPR or CPRA require entities to record and implement data minimization measures. Once your sensitive customer data reaches the end of its retention period, remove it. You can also remove the ROT (redundant, obsolete and trivial data) clogging your systems. ROT makes it harder to comply with privacy or records regulations, increases the attack surface for a data breach, and means AI models may provide substandard or noncompliant outputs.
How to start: a 90-day, high-impact pilot
This is an organization-wide effort, but it needn't be daunting. Get started by attacking one high-risk data issue: an excess of claims documents, tangled producer mailboxes, or SharePoint sprawl. Prove value fast by making an obvious difference in the storage volumes and help one business unit to achieve their goals faster.
Trusted, minimized data lowers risk and funds transformation. By taming the unstructured data beast, insurers lower their own data risk and position themselves to succeed in an AI-focused landscape, while preserving the trust of their customers.