Moving Beyond Data Lakes

Federated data graph technology can help carriers overcome long-standing obstacles, harness their data and fully unlock the AI moment. 

Two halves of a brain -- one showing a typical brain and the other showing artificial intelligence -- all against a grey background

KEY TAKEAWAYS:

--One of the core problems that makes it difficult for large carriers to innovate with their data and IT strategy is the scattered architecture that is the logical result of growth and acquisitions over a long time. To address the challenge, many insurance IT leaders and consultants propose a central data lake or enterprise data warehouse that gathers all the data into one place. But it’s extremely difficult to execute a data lake or data warehouse project, and it usually takes years and hundreds of millions of dollars to implement.

--There is an often-overlooked alternative that stems from the microservices architecture that many startups have adopted: a federated data architecture. Instead of moving all the data from the different sources into one central location, a query layer is built on top of existing data sources and only gathers data upon request. What makes this approach much easier to set up and maintain is that there is no need to configure the architecture for storing and maintaining a large amount of data. 

----------

With all the buzz around generative AI, P&C carriers are rushing to evaluate how and where to best apply this emerging technology. But is the insurance industry ready for this next wave of innovation, or are the same limitations that have limited real progress in the past a cause for concern? 

There are positive signs. Recent innovations in data querying, caching, pipelining and transformation should give insurers reason for optimism. In fact, I’d argue that these innovations in underlying data architecture are as exciting for our industry as the changes we’re seeing in AI – if not more so. This article looks at how federated data graph technology can help carriers overcome long-standing obstacles in harnessing their data to fully unlock insurance’s AI moment. 

Insurers Can No Longer Afford to Underuse Data

One of the core problems that makes it difficult for large carriers to innovate with their data and IT strategy is the scattered architecture that is the logical result of growth and acquisitions over a long time. Of the top 10 P&C carriers in the U.S., the newest kid on the block was founded in 1937. This creates a multitude of challenges: There is a huge barrier for data engineers and analysts to derive actionable insights across different systems, and every new initiative takes 10x the time because it involves multiple data migration projects.

To address the challenge of disparate data, many insurance IT leaders and consultants propose a central data lake or enterprise data warehouse that gathers all the data into one place. Although this approach can solve the problem, it’s extremely difficult to execute a data lake or data warehouse project, and it usually takes years and hundreds of millions of dollars to implement. 

Building the jobs that move all data from different sources into one place is not easy, and even though there are open source solutions available, maintaining and building them requires skilled staff and can be prohibitively expensive. What’s more, once the data has been moved, it often requires significant transformation in the context of any given business use case.

In the case of mainframe data, for example, making even a minor change to the data format is non-trivial and may require workarounds because the people who know how to work with mainframe data are now few and far between. One global P&C insurance carrier we work with built a data lake, only to realize, after the multi-year project was completed, that they needed a way to transform the information from the data lake back into the mainframe format to keep their current business running. All this means that the promise of building applications on top of your data lake always seems “just around that next corner.”

Data Volumes Outpace Architecture 

According to Stanford University’s AI Index 2022, it is now a proven fact that data grows faster than Moore’s Law. In other words, the amount of data we collect tends to grow more quickly than the growth of our computing power and processing efficiency. This means data lake spending will only increase, just to maintain the large amount of data an insurance carrier collects year after year.

This issue manifests itself across the enterprise and is often felt acutely by front-line underwriters and operations staff who struggle to turn mountains of data into insights they can actually use to guide risk selection and portfolio management decisions. Underwriters routinely tell us that they aren’t swimming in data, they’re drowning. As a whole new generation of innovators continues to build more sophisticated data-driven insurance products – telematics, anyone? – these problems become worse, and the back-end IT challenge of data organization grows exponentially.

Consider a Federated Data Layer Versus a Data Lake

If an insurer is willing to pay and has the patience, the data lake may make sense long-term. But many carriers are under increasing pressure to implement new underwriting applications right now to improve the workflow and boost underwriting productivity and performance. They’re also working to come to grips with emerging risks like climate-change-related natural catastrophes, cyber attacks and social and economic inflation. For insurers that do not have a decade to wait, there is an often overlooked alternative that stems from the microservices architecture that many recent technology startups have adopted: a federated data architecture. 

Instead of moving all the data from the different sources into one central location, a federated data layer is a query layer built on top of the insurer’s existing data sources that only gathers data upon request. What makes this approach much easier to set up and maintain is that there is no need to configure the architecture for storing and maintaining a large amount of data. 

Using open source solutions like GraphQL and Apollo, insurers can implement the query-able data layer in less time than it typically takes to establish a data lake. Once the query-able layer has been established, the bulk of the work that remains to set up an agile and configurable federated data architecture is mainly in building out specific connectors for every source of data.

On top of shortening time-to-value versus a data lake, the federated data graph gives the end user the ability to access data in real time, which is great for building modern applications (for example, dynamic dashboards or workflows) on top of existing databases.

In an interview with Carrier Management, Greg Puleo, vice president, digital transformation at QBE North America ,explained the power of a modern underwriting application that leverages an underlying federated data graph: “We now have the chassis that we can start to bolt other things to, and all those other data providers now just become an API [application programming interface] integration seamlessly in the workflow. The underwriters can make better decisions using that data without having to do extra steps.” 

Challenges like retainment of data, change control and disaster recovery remain at the individual data sources, which most likely were set up to solve these challenges in the first place. As the insurance industry goes from static analysis and historical data to more dynamic and AI-powered models like "predict and predict," the ability for end users to access relevant data and insights in real time is essential.  

A Few Caveats

A federated data layer is not a “solve it all” remedy for an insurer’s ills. There are real challenges in maintaining the schema as data changes, and building customer connectors is not always an easy task given the number of legacy databases still around.

Today’s most popular policy administration systems and other core insurance systems are already 20-plus years old and are not designed for easy data access and sharing outside the system – and as insurance technologists know all too well, there are still mainframes and AS/400 midrange servers lurking in dark corners of the data center. 

Insurers Don’t Have to Do It Alone

The right insurtech partner can be of enormous value in helping insurers build out a modern data architecture in lockstep with efforts to build new applications and workflows. Insurers should look for partners who share their vision of how better data can fundamentally transform insurance and who have demonstrated experience in employing advanced technologies and architectures to solve long-standing data issues. In addition to augmenting internal IT resources and expertise, an insurtech partner often serves as a forcing function, motivating internal IT teams to move projects to the finish line.

"From a business perspective, we weren’t looking for a vendor,” Thomas J. Fitzgerald, former president of commercial insurance at QBE North America told Carrier Management. “We were looking for a partner. We were looking for somebody who could ultimately come in and understand the myriad needs that we had, and had the flexibility and the agility to come along on a journey with us." 

As with any large-scale change, it’s essential to have a destination in mind and to focus on what you’re trying to improve for your end users and the business. In this way, you can avoid “data modernization for its own sake’ and ensure that modernizing your architecture happens in the context of meaningful innovations to core insurance processes and workflows – things that can actually affect your users and lead to better business outcomes.

A Real-World Insurance Use Case

Let’s look at a real example of why a federated data graph can be advantageous from a business and end user perspective. Underwriters have three main levers that they can manipulate when balancing their portfolio: rate, retention and new business. The business challenge is that these levers often seem to work counter to one another. If you increase the rate for an account, for example, it may hurt your ability to retain the client when the policy comes up for renewal. It’s a constant balancing act for front-line underwriters to navigate the inevitable tradeoffs among rate, retention and new business.

So, let’s say you want to calculate your retention. Sounds easy, right? But not so fast. If policy administration information is dumped into a data lake or enterprise data warehouse, it is often dumped partially or without full context. For example, total premium on a property schedule and premium by coverage might be available for analysis, but premium by building/location or in relation to total insurable value (TIV) may not. 

Down the line, when the business wants to build a simple retention dashboard but chooses to calculate retention on a "same exposure" basis (i.e., accounting for changes in buildings or the value of those buildings, not just new premium/old premium), they often cannot do it. Inevitably, the data about the exposure base is trapped in two worksheets, one from each year, and so yet another worksheet is created to take those exposure bases and the premium values and calculate a simple metric.

When an organization builds a business-centric application using a federated data graph with direct connection to data sources, they often get ahead of the transformations for simple core metrics like retention. A data graph forces the business to organize their data in a way that is aligned with how the business operates, saving an enormous amount of time and effort down the road. 

In other words, without investing an appropriate amount of time into getting data into the format the business can actually use to measure its effectiveness and progress toward organizational goals, a data lake is simply a lake/sink. This often renders future application builders helpless – the data lake contains hundreds of thousands of data points, but the relevant data they need remains inaccessible.  

From the business perspective of empowering end users, a hybrid approach that recognizes that sometimes a given application needs to be built in a federated way to be most effective makes more sense than an “all or nothing” approach that forces application builders to build their apps on top of a data lake. Ultimately, the business and IT need to think about the form factor in which data needs to exist to empower their users to achieve their goals.

Summing Up

As the insurance industry eyes a potential AI arms race, the carriers that will gain a real advantage from AI will be those that can harness their underused data investments to drive meaningful advances to core insurance processes. Federated microservice-based architectures and data graph technology provide insurers with a viable alternative to data lakes as a means of tackling legacy tech debt and bringing much-needed agility and data-driven innovation to insurance.


William Steenbergen

Profile picture for user WilliamSteenbergen

William Steenbergen

William Steenbergen is CTO and co-founder of Federato, the insurance industry’s first RiskOps platform that embeds portfolio management and optimization into the core underwriting workflow.

The RiskOps platform’s underlying federated data graph, which enables a single pane of glass view of client information, is key (that’s why the company is named Federato!).

As a researcher in Stanford’s Human Computer Interaction Group and the Institute for Computational Mathematics at Stanford, Steenbergen has worked on state-of-the-art algorithms in reinforcement learning and dynamic optimization.

Read More