For the first time ever, direct premiums in P&C exceeded $1 trillion in 2025. Also a first in 2025: a $14.6 billion alleged fraud ring was exposed. (The prior record was $6 billion.)
The watchword for industry executives should be: "entity."
Fraud risk, customer experience, and effective AI? They're all keyed to entity. The money you make, the money you keep, and the faster you grow? Entity, again.
That total of direct premiums means there are now more than one trillion reasons to understand who is paying you and who you are paying. That "who" is an "entity" -- people, businesses, and organizations.
Entities have identity – names, addresses, phone numbers, etc. In logical fashion, there are only three kinds of entities – trusted, unknown, and untrusted. If you can't distinguish among these three kinds, then you are reading the right article.
With interaction, entities also have history, behavior, and outcomes. Entities may be related to each other. Sometimes those relations are very transparent, like parent-and-child or employer-employee. Sometimes the are hidden, like in an organized crime ring or in a conspiracy and collusion affiliation. Entities may be multifaceted – driver, renter, business owner, group leader, member of an organization, neighbor, volunteer, relative, known associate. These relationships all change over time, yet there is still the same entity.
Reflect on this for a pause here. Consider yourself for example as EntityONE. Now quickly list all the roles and relationships you have in the physical world at your home, office and neighborhood, and then online as an emailer, shopper, commentator, reader. Your identity in all those real and digital places may take different forms, but it is always you, EntityONE.
The everyday entity
In the day-to-day of insurance and business life, there is always a concern about fraud and abuse. From application through claims payment, your need to know your business extends from your new business funnel through, third parties, vendors, customers, agents, and even staff.
A new person applies for car insurance, a business makes a claim involving a third party, an invoice arrives from a new address, an agent makes a submission, finance issues a payment – to trust or not to trust?
Names, addresses, phone numbers, etc. are the data vestiges of ways to describe an entity. Either physical or digital in origin, these data are typically scattered across various boxes in an organization chart and different core, ancillary, and API-accessed third party systems.
We store identifier elements like names and address with varying lengths, spellings, inaccuracies, and levels of incompleteness, and in unstructured and semi-structured data entry fields and free form text like notes and templates.
Then we store them again and again over time, moving between systems, between carriers, between vendors, and of course, across multiple CRM applications, which are additionally stuffed with all manner of duplicate and partial records.
Think of yourself as EntityONE
If you tried to have your own self, hereafter called EntityONE, appear the same in every field in every system in every organization over time, you would fail. Even if you never moved and never changed your name, random data entry error alone would ruin your ambition.
One data exercise to try at home: If you have address data from northern California – find a system where "city" is collected as part of an address. Then see how many ways "San Francisco" appears. At one large carrier with tens of thousands of transactions across five years of data entry there were 97 unique entries.
The correct answer was the dominant response, "San Francisco." Shorthand like "SF" and nicknames like "SanFran," "Frisco," and "San Fran" were next. A lower-case version of the correct answer was next, "san francisco." All sorts of typos and transpositions followed. An unthought-of case was a space key entry as a valid character – "S F" is now different than "SF." And those space key values could be leading, trailing, or in the middle. Another very frequent response, when permitted by system data field edit logic, was "blank," no entry at all, or in some cases any number of space key entries.
If you ran a literal matching algorithm on the "city" field, in theory EntityONE could have 97 different data "cities" yet is still only a single unique entity.
Some other factors might also contribute to your failure to have perfect EntityONE data.
One system has separate fields for first name and last name, with no field for middle name and no fields for title/prefix, or suffix. Another system has one long field where all of that is supposed to be entered. Is it Dr. or Mrs. or Ms or Miss with suffix MD, PhD, DO?
Generally, the simplest of contact information – name, address, phone number – can be entered and stored so inconsistently in so many multiple places over time that EntityONE would not exist as a whole and unique name-address in the best of cases.
When it comes to legal entity, the EntityONE Family Trust, or your business version, EntityONE., it's still you, but you now may also have shared rights and not be the only decisionmaker. So enough of thinking of just yourself.
Think of how difficult it might be to search for your customer as their data is entered and maintained across different systems in different ways. Your decades-old processes still treat paper and data as if they were entities, not as entities that have related paper and data.
This work process of literal data computing is at the core of delivering customer experience but allows an opening for fraudsters and is the bane of AI.
Let this sink in: Data are not entities; entities have data.
Entities have data. You as EntityONE are unique. All the aliases, name changes, addresses, business titles, partnership and shareholder situations, and your honorifics aside, you are still you. Even after you pass away, the estate of EntityONE will persist.
Resolving the many ways to identify you is now what you need to turn inside out.
Every other person, business, group, and organization has the same issues. When you encounter any identity, you need to resolve it down to the core entity, or you will not know who you are dealing with.
Whether an entity is legal or not legal or illegal or foreign or even sanctioned, as we think on the identity data we see every day, many entities present as if their data is thin, with seemingly little to none. Some appear squeaky clean. Some have long years of history. Some look like they popped out of thin air. Some, like a bad penny, keep popping up after we have decided not to interact with them. Synthetic, assumed, straw man, take over, hacked, phished, fraudulent, and other forms of malfeasance also exist.
Keeping tabs on entities (e.g. people and organizations), and the hidden relationships among them in real time is now practical with advanced analytics powered by a technology known as entity resolution. Entity resolution brings all the snippets of various identifiers around an entity into focus.
Entity resolution may involve several efforts, all claiming to do the same thing across your data and computer laden landscape. In the earliest days of computing, crazy sounding technical terms sprouted to try to address this existential data identity issue around keeping EntityONE clearly in focus. It started field by field in databases and has modernized to complex multi-attribute vector and graphical analytics.
These geeky but incomplete early algorithms left a lot undone while still showing some value – they had names like Levenshtein (an edit distance formula for suggesting a typo was made in text similarity), Hamming distance, and more recently in AI terms, tokens with Jaccard and Cosine TF-IDF similarity approaches. There are dozens upon hundreds of challenger approaches. But an analytic or a technique is not a product or a solution.
An early inventor created a combination of steps and orchestrated a set of code he called "fuzzy matching." (In memory of Charles Patridge, here is a link to a seminal paper he wrote.) Many data analytic communities shared that code and subsequent innovations to make progress on name and address standardization and name and address matching. The postal service benefited greatly with more deliverable mail, and database marketing boomed, while customer analytics and lifetime value ascended, as did provider and agent and vendor scorecards with more ambitious service level monitoring.
As with many other business problems, necessity is the mother of invention. Almost every company now has inventions that come from do-it-yourself, homegrown efforts. It is the only way forward before a workable, scalable solution is created.
Also likely installed are several versions and half attempts of making the problem better inside an application or between systems. First, companies used data quality checks, then field validation efforts, then more hardened data standards. For all that work, the human data entry staff invented "99999" and other bypass work hacks. You can see that still today.
This data is what you are training your AI models on.
The largest legacy problem today is this data pioneer spirit turned hubris. IT pros and data science teams do the best they can with what they have – full stop. The satisficing behavior limits their contribution. It also injects unneeded error into all the models they are building and operationalizing. Much of the AI risk is self-inflicted poor entity resolution management. Actuary staff feel largely immune at the aggregated triangle and spreadsheet point of view, but that is a false sense of security, since they cannot see into the granularity of transactions beneath a spreadsheet cell. This is changing dramatically fast with the emergence of the machine learning and AI wielding actuarial-data_scientist corps of employed professionals, academicians, and consultants.
New techniques like large language models (LLM) are making short work of text data in all forms to create new segmentation and features for existing models, while also enabling new modeling techniques to iterate faster. The next phase of workflow improvement is almost limitless. All these great breakthrough efforts need an entity level of application to have their highest value.
The rise of industrial-grade entity resolution
The financial stress indices are high. The sympathy toward companies is low. The opportunity to use AI and seemingly anonymous internet connections makes people think they can't get caught – a presumption with a lot of truth to it these days.
A shout out to our industry career criminal counterparts enjoying the status "transnational criminal organizations": Terms like straw owners, encrypted messaging, assumed and stolen credentials, synthetic identities, and fake documentation are now everyday occurrences.
And that's just what relates to money. For truly awful perpetrators, anarchists, drug dealers, arms dealers, human traffickers, hackers, terrorists, espionage, traitors, nation state actors, and worse, the problem space of entity resolution is mission critical.
Keeping tabs on entities (e.g. people and organizations), and the hidden relationships among them in real time is possible today. It elevates internal "good enough'" learned implementations to "never finished being done, continuously adapting, and real time' data driven implementations."
What you should do about entity
The most capable solutions sit around existing efforts in place, so no need to rip and replace anything. This makes entity resolution prioritization easier, as it can be adopted with what you do now. This extends to your analytic ambitions in cyber resilience and digital modernization, as it can interact seamlessly with additional identifiers like digital entity resolution – emails, domains, IP addresses, that have an address corollary to a street address in a neighborhood. (Here is an earlier article I wrote for ITL on "Your Invisible Neighbors and You.")
Do yourself, your board, your customers, and your future AI successes a favor and get serious about entity and entity resolution as the nearest thing to a single truth as you can get.
Some Background
The author has built matching and fuzzy matching applications multiple times with multiple technologies over a four-decade career and advises that benchmarking is essential for understanding fit for use in entity resolution. A four out of five, or 80%, accuracy might be fine for some use cases and considered corporately negligent in others. Getting to the high 90s takes much more data and resources than most internal teams can dedicate on a sustained basis.
A practical example from the author’s experience is Verisk Analytics, where they have billions of records of names and addresses coming from hundreds of carrier systems, all needing attribution to an entity level for highest business value. They have instituted an industrial solution to supplement or replace methods the author’s team built originally for fraud analytics.
The vendor they give testimonials for is one that is now being adopted in insurance after widespread use in governments and security, customer management, financial integrity, and supply chain use cases globally. It is called Senzing. Their methodology creates the capability to recognize relationships across a number of data attributes and features shared across disparate records and systems, e.g. names, addresses, phone numbers, etc. in real time.
Modern entity resolution systems can deploy inside your company as an SDK, so you never need to share any data to move forward. Multiple use cases around your enterprise can also derive benefit from improving entity resolution management so it is reliable on the first shot.
