August 28, 2018
How Connected Data Can Help Stop Fraud
by Pete Aven
You can start to look at things like shared addresses, shared numbers and shared emails and see who is up to illegitimate activity.
Insurance companies with legacy systems can find it extremely challenging to bring their data together because of different data formats and system access methods. They might have multiple sources of content with similar information for customers, claims, agents, books of business — information that, like in most organizations, was acquired over time or resulted from a merger. They might have mainframes and relational systems, and then they bring in third-party data. Add the fact that the insurance agency is invested in digital transformation, and you realize that the insurer’s relationship with the customer is changing. The relationship with the customer used to be managed by the agents. Now, there’s a desire to manage customers more directly by the actual insurers and bring in that data. The complexity of the underlying data sources and the data they want to bring together makes this difficult.
The challenge is trying to move all the data into some sort of central, unified location, but insurers are not able to do it at the scale that they would like. There are many attributes related to customers and policies and claims. So, instead of bringing all that data together and asking all the questions that insurers would like to ask, they cherry pick three or four. They spend a lot of time writing extract, transform and load programs, as well as other data processing pipelines, to move data from the source systems into some sort of target schema. So, the day to day is a lot of gnarling, churn, programming and data movement to answer a slimmer portion of the entire question set that companies would like to ask of the data.
See also: Workplace Wearables: New Use of Big Data
Modeling Data to Detect Fraud
When it comes to fraud indicators, there are many signs that can be identified by the relationships in the data. For example, on a policy application for insurance, there are phone numbers, addresses and the relationship to an agent or an organization who sold the policy to the individual. If someone gets a policy with one agent and then tries to get a similar policy with a different agent, the applicant could be shopping around for the best deal or the agent could be trying to give someone a policy he doesn’t need. But relational databases typically aren’t good at highlighting these types of issues.
In addition, while some things that are more easily modeled as a graph, the hierarchical data in insurance is typically put into rows and columns and tabular format. For example, in insurance, a book of business can belong to an organization or an agent, but an organization can have agents, which can have a book of business. It’s a recursive model. If you want to understand the relationships and examine some sort of policy tied to them, the analysis can get very complex. But when you put the data into a graph, where you have it modeled as entities and relationships, you can quickly pattern match to see who are the individuals and agents who have a relationship to a policy or application.
A person should only have one type of relationship to a certain type of policy. When you compare and quickly visualize and see this person has two relationships to two policies that are similar, you can ask, “Why?” You can very quickly tease out that there is something there. If the pattern doesn’t match, the issue is quick and easy for you to identify.
There is a similar scenario for agents. Agents can sell certain policies and not others. When you model the data as a graph, you can say this agent has an inappropriate relationship to a policy. A one-line, simple query can expose the agents who are engaging in this type of behavior. Also, when you have that visualization of their relationship to the policies they are and aren’t allowed to write, an actual physical pattern emerges of those relationships, where it gets easy to identify and spot who is up to nefarious or questionable activities.
Using Data to Prevent Fraud
There is a lot of complexity in these organizations and in how agents, customers and the insurers interact. If an insurance organization were going to start a modernization project around fraud investigation and fraud prevention, it should leverage the technology that allows it to quickly manage information as a graph.
Property graphs are very adaptive; they are additive. Traditional data integration requires that you must understand all your sources and all the attributes before you begin. Then you come up with the schema to encapsulate all the data, and that’s what the proposition is. This encapsulation takes years, and no one ever hits the target because business sources and targets change. With graph technology, you can start to rapidly connect just the data you need as you need it and continue to append and add to those graphs to create a rich view of the data landscape. With a graph, you can start to tease out things and use the relationships where addresses, phone numbers and emails become things unto themselves related to a person, policy or a claim.
See also: 5 Key Effects From AI and Data Science
The reason you want to do these types of things is because you can quickly start loading hundreds of thousands of policies and claims and applications into the system, and you can start to look at things like shared addresses, shared numbers and shared email addresses. Very quickly, you can start to see who is up to legitimate activity and who is up to illegitimate activity. There are indicators regarding things like a phone number. Fraudsters tend to use the same phone number for all fields of any policy applications. When you load these applications together and examine at scale, you’ll see in the data that no one else has a relationship to the phone number the fraudsters have used. But it’s common to see people share phones in a home or office when they’re not engaged in fraud. You can tease out those relationships, as well.
Another example is address information. When you look at policy applications, the person’s address shouldn’t necessarily be the same as the employer’s address or the agent’s address. There is value in having the entities and relationships to model, so you can quickly identify who has the appropriate relationships to which entities. You can see if someone is even a policy holder, if the person has any relationship with the agent, if the person has the same address as the agent’s, etc. When you load all the data into the system, relationships allow you to quickly see the behaviors between the transactions. This is one of the key benefits of working with connected data.