Insurance Should Lead with Data-Centric AI

Insurance must get beyond the hype of the "I" in AI and become more pragmatic in its use of AI/ML technologies for generating business insights.

A graphic of connected nodes surrounding computer hardware

After almost 15 years of increasingly effective technological developments in the AI/ML arena, we are at a point where the algorithms and the trained models are well-known and the overall architecture of neural nets is well-understood. However, the quest for reasonable ROI from AI/ML projects continues, with data issues impeding wider adoption.

The current and still evolving technologies coined loosely under the term data-centric AI could help many industries tackle issues with data and make help make meaningful progress in reaping benefits of AI/ML sooner than later.

Insurance as an industry should take the lead in adopting data-centric AI technologies in providing better customer experience to the insureds.

Issues with data

Many of the following have been well-documented and are issues for which solutions have been emerging for some time: 

  • Sovereignty and regional aspects
  • Privacy and security
  • Accuracy
  • Bias and explainability, responsible and ethical AI/ML
  • Interruption
  • Unavailability of large data sets
  • Bespoke model training

Adhering to aspects of data like sovereignty with all the current and emerging regulations requires insurers to essentially train AI/ML models in country or in some cases in the region where the models will be used to predict business insights. Because data is indicative in the context of regional and local market dynamics, AI/ML models should be trained locally for avoiding bias and making them explainable in the local context.

Another issue is that the pandemic caused major disruptions to business data, essentially weakening the efficacy of pre-trained models. This led many insurers to spend time and effort in re-training deployed models, as many of them built bespoke models in-house. 

See Also: The Data Journey Into the New Normal

What Is Data-Centric AI?

It is loosely defined as AI/ML that depends on data that is engineered to a) account for domain-specific nuances while also factoring in the regional/local context, b) handle regulatory aspects like the appropriate amount of anonymization, c) remove bias from data that is used for training, d) depend on smaller but relevant data sets when large data sets are not available and e) potentially use synthetic data that is generated by tools that try to maintain statistical similarity to real data.

This engineering of data goes beyond the traditional sourcing, cleaning and basic, algorithm-related tuning that happens today. Increasingly, tools to help visualize and engineer data are appearing in the market.

The data-centric AI and related tools are aimed at enabling business domain experts who can manage AI/ML initiatives without the need for a large team of data scientists and IT experts.  

What should the insurance industry do?

The insurance industry should look at data-centric advances in AI/ML and take the opportunity to lead in providing a better experience for insureds. Here are some suggestions for insurers as they embark on and in some cases re-look at their current AI/ML initiatives:

  • Depend on foundational models
    • There is a growing movement in many industries to depend on pre-trained models and use them as foundational elements to improve efficiency in the context of a specific entity. This includes training specific areas that need improvement. This is as opposed to re-training the entire model. 
  • Use smaller but relevant data sets
    • The insurance industry is rich with data; however, it Is not at the scale that consumer-facing entities collect data that enables them to train machine learning models for increasingly better accuracy. Moreover, there are questions for which the answers are not clear or are evolving. Who owns the data? Can the data be used for analysis? To what extent is the industry comfortable with anonymization technologies?

In this context, Insurers should start looking at engineering the small but relevant data sets that are easily available and can help improve accuracy of the models.

  • Evaluate use of newer anonymization technologies
    • Technology that allows advanced analytics on encrypted data are maturing and should help insurers build business cases that involve their partner data.
  • Build data engineering organizations – not just IT teams
    • AutoML technologies help move the skill gap in using AI/ML to the left, meaning knowledgeable business analysts should be able to do most of the tasks of a data scientist. AutoML technologies have traditionally not focused on allowing non-IT teams to engineer data, but tools for helping them do so for better model accuracy are emerging and are increasingly contributing to the data-centric AI movement.
  • Use synthetic data selectively
    • Tools that generate synthetic data to supplement the smaller data sets that insurers depend on today are gaining traction. While many of the tools may not generate data that removes bias and do not necessarily maintain statistical integrity of the data that is required for effective models, they are a good start. At the outset, an easier way to start using synthetic data is to apply on a subset of the AI/ML system where there are issues with accuracy.
  • Train models regionally
    • Increasing regulations on data in many countries may necessitate the training of AI/ML models in-country or in-region. This has the added benefit of reducing bias in training, probably will make it easier to explain the decisions put out by the algorithms and may be more accurate. However, in the past, scalability of training models locally and regionally has been an issue. With data-centric AI/ML tools augmented by AutoML tools, insurers should be able to set up a highly efficient business operation in training models locally.
  • Build a framework and governance for responsible and ethical AI
    • The EU is leading the way in helping define a framework for responsible and ethical AI. Insurers should review their output and look at data-centric AI technologies as the foundational elements to define a bespoke framework for their business and set up governance to prevent and mitigate liabilities resulting from their use of AI in making business decisions.


It is imperative that insurance as an industry gets beyond the hype of the "I" in AI and becomes more pragmatic in their use of AI/ML technologies for generating business insights. The recent evolution of AutoML technologies helped shift the required skills to the left and reduced dependency on data scientists and IT teams. However, many of the issues with data necessitate the rethinking of the use of AI/ML in a data-centric way, helping business domain experts to engineer data and address many of the macro-issues and in the process, improve efficacy of trained models and keep them relevant in their continued use over a period of time.

Chak Kolli

Profile picture for user ChakKolli

Chak Kolli

Chak Kolli is the global chief technology officer for insurance at DXC Technology.

He is responsible for DXC’s global insurance software product and services strategy. He is also responsible for working with clients using new and emerging technologies to transform their business.

Prior to DXC technology, Kolli led large global initiatives as a senior leader at TCS and AIG.

He has a Ph.D. in computer science from George Washington University.

Read More