Tag Archives: data science

The New Insurance Is No Insurance

Insurers are aware that technology will help to reduce claims drastically and therefore finally run premiums down to unsustainable levels.

Time to move on

“Insurance is a cornerstone of modern life. Without insurance, many aspects of today’s society and economy could not function. The insurance industry provides the cover for economic, climatic, technological, political and demographic risks that enables individuals to go about their daily life and companies to operate, innovate and develop.” Source: Insurance Europe

I fully support this, but the way this cornerstone fits into modern life needs attention. It’s time to move on.

See also: Insurance Coverage Porn  

The Third Wave

Twenty years ago, I set up the first digital insurance. Ten years ago, I set up (again the first) mobile insurance Now we’re heading for the third wave: connected insurance.

Real connected insurance with the new opportunities that technology brings is what I (with a few former colleagues) believe in and have been working on for some time now.

Of course: “IoT,” “data science,” “AI,” “customer-centric” and “on-demand” are the buzzwords. But let me add two: “holistic” and “transversal.”

“Holistic” refers to the complete modern household with a connected lifestyle and “transversal” to the consumer who is completely not interested in our industry verticals.

Insurance has to stay but with an overall and fresh approach. Hundreds of insurtech initiatives are currently taking pieces and add sometimes compelling features. See the Sherpa-Neos-Interpolis-Trov-CBien-Metromile-Vitality-Clark-Knip-PolicyGenius-Lemonade-Inshared-like initiatives.

The real challenge is to bring it all together to a compelling, simple, transparent and engaging full service offer to the customer.

Focus on prevention

The new insurance is no insurance — meaning the focus should not be on pushing insurance products but on offering prevention services.

For this we developed an international concept for smart protection called InConnect, with the household as hub connecting all smart devices, vehicles and wearables and with technology and data used to improve prevention.

Safety and Peace of Mind

Unbiased personal risk management tools (Primes) help reduce insurance to what is really needed in one universal personalized policy without redundancies and gaps, dynamically adjusted to the actual situation and needs with:

  • On-demand add-ons
  • Built-in loyalty and reward system
  • Ready connections for sharing cars, rides and homes
  • Easy combining or splitting households
  • Privacy and cyber risk recognized

and backed with a one of a kind insurance and claims IT platform.


The overall concept is quite ambitious. Although we’ve successfully done ambitious businesses before and have qualified people and technology on our side, we’ll start on a controllable scale. A startup will kick off in three European countries with home, motor and travel.

As soon as we’ve completed our search for the right partners, we’ll start proving the concept, do the learning and keep you posted. Of course we’re always looking for enthusiastic and good individuals. Feel free to give me a buzz.

See also: The Insurance Model in 2035?  


Insurers are experts in risk and capital management, and that is what they should keep doing, but in a different perspective. Deploy that expertise in the new environment of connected lifestyles.

Data Science: Methods Matter (Part 4)

Putting a data science solution into production after weeks or months of hard work is undoubtedly the most fun and satisfying part. Models do not exist for their own sakes; they exist to make a positive change in the business. Models that are not in production have not realized their true value. Putting models into production involves not only testing and implementation, but also a plan for monitoring and updating the analytics as time goes on. We’ll walk through these in a moment and see how the methods we employ will allow us to get the maximum benefit from our investment of time and effort.

First, let’s review briefly where we’ve been. In Part 1 of our series on Data Science Methods, we discussed CRISP-DM, a data project methodology that is now in common use across industries. We looked at the reasons insurers pursue data science at the first step, project design. In Part 2, we looked at building a data set and exploratory data analysis. In Part 3, we covered what is involved in building a solution, including setting up the data in the right way to validate the solution.

Now, we are ready for the launch phase. Just like NASA, data scientists need green lights across the board, only launching when they are perfectly ready and when they have addressed virtually every concern.

See also: The Science (and Art) of Data, Part 2  

Test and Implement

Once an analytic model has been built and shown to perform well in the lab, it’s time to deploy it into the wild: a real live production environment. Many companies are hesitant to simply flip a switch to move their business processes from one approach to a new one. They prefer to take a more cautious approach and implement a solution in steps or phases. Often, they choose to use either an A/B test and control approach or a phased geographic deployment. In an A/B test approach, the business results of the new analytic solution are compared with the solution that has been used in the past. For example, 50% of the leads in a marketing campaign are allocated to the new approach while 50% are allocated to the old approach, randomly. If the results from the new solution are superior, then it is fully implemented and the old solution removed. Or, if results in one region of the country look promising, then the solution can be rolled out nationwide.

Depending on the computing platform, the code base of the analytic solution may be automatically dropped into existing business processes. Scores may be generated live or in batch, depending on the need. Marketing, for instance, would be a good candidate to receive batch processed results. The data project may have been designed to pre-select good candidates for insurance who are also likely respondents. The results would return an entire prospect group within the data pool.

Live results meet a completely different set of objectives. Giving a broker a real-time indication of our appetite to quote a particular piece of business would be a common use of real-time scoring.

Sometimes, to move a model to production, there’s some coding that needs to happen. This occurs when a model is built and proven in R, but the deployed version of the model has to be implemented in C for performance or platform considerations. The code has to be translated into the new language. Checks must be performed to confirm that variables, final scores and the passing of correct values to end-users are all correct.

 Monitor and Update

Some data projects are “one time only.” Once the data has appeared to answer the question, then business strategies can be addressed that will support that answer. Others, however, are designed for long-term use and re-use. These can be very valuable over their periods of use, but special considerations must be taken into account when the plan is to reuse the analytic components of a data project. If a model starts to change over time, you want to manage that change as it happens. Monitoring and updating will help the project hold its business value, as opposed to letting its value decrease as variables and circumstances change. Effective monitoring is insurance for data science models.

For example, a model designed to repeatedly identify “good” candidates for a particular life product may give excellent results at the outset. As the economy changes, or demographics change, credit scoring may exclude good candidates. As health data exchanges improve, new data streams may be better indicators of overall health. Algorithms or data sets may need to be adapted. Minor tweaks may be needed or a whole new project may prove to be the best option if business conditions have drastically changed. Monitoring the intended business results compared with results at the outset and results over time will allow insurers to identify analysis features that no longer provide the most valid results.

See also: Competing in an Age of Data Symmetry

Monitoring is important enough that it goes beyond running periodic reports and having hunches that the models have not lost effectiveness. Monitoring needs its own plan. How often will report(s) run? What are the criteria we can use to validate that the model is still working? Which indicators will tell us that the model is beginning to fail? These criteria are identified by both the data scientists and the business users who are in touch with the business strategy. Depending on the project and previous experience, data scientists may even know intuitively which components within the method are likely to slide out of balance. They can create criteria to monitor those areas more closely.

Updating the model breathes new life into the original work. Depending on what may be happening to the overall solution, the data scientist will know whether a small tweak to a formula is called for or an entirely new solution needs to be built based on new data models. An update saves as much of the original time investment as possible without jeopardizing the results.

Though the methodology may seem complicated, and there seem to be many steps, the results are what matter. Insurance data science continually fuels the business with answers of competitive and operational value. It captures accurate images of reality and allows users to make the best decisions. As data streams grow in availability and use, insurance data science will be poised to make the most of them.

Data Science: Methods Matter (Part 3)

Data science has grown in inevitability as it has grown in value. Many organizations are finding that the time they spend in carefully extracting the “truth” from their data is time that pays real dividends. Part of the credit goes to those data scientists who conceived of a data science methodology that would unify processes and standardize the science. Methods matter.

 In Part 1 and Part 2 of our series on data science methods, we set the stage. Data science is not very different from other applied sciences in that it uses the best building blocks and information it can to form a viable solution to an issue, whatever that issue may be. So, great care is taken to make sure that those building blocks are clean and free from debris. It would be wonderful if the next step were to simply plug the data into the solution and let it run. Unfortunately, there is no one solution. Most often, the solution must be iteratively built.

This can be surprising to those who are unfamiliar with data analytics. “Doesn’t a plug-and-play solution just exist?” The answer is both yes and no. For example, repeat analytics, and those with fairly simple parameters and simple data streams, reusable tools and models, do exist. However, when an organization is looking for unique answers to unique issues, a unique solution is the best and only safe approach. Let’s consider an example.

See also: Forget Big Data — Focus on Small Data

In insurance marketing, customer retention is a vital metric of success. Insurance marketers are continually keeping tabs on aspects of customer behavior that may lead to increasing retention. They may be searching for specific behaviors that will allow them to lower rates for certain groups, or they may look for triggers that will help the undesired kind of customer to leave. Data will answer many of their questions, but knowing how to employ that data will vary with every insurer.

For example, each insurer’s data contains the secrets to its customer persistency (or lack thereof), and no two insurers are alike. Applying one set of analytically derived business rules may work well for one insurer — while it would be big mistake to use the same criteria for another insurer. To arrive at the correct business conclusions, insurers need to build a custom-created solution that accounts for their uniqueness.

Building the Solution

In data science, building the solution is also a matter of testing a variety of different techniques. Multiple models will very likely be produced in the course of finding the solution that produces the best results.

Once the data set is prepared and extensive exploratory analysis has been performed, it is time to begin to build the models. The data set will be broken into at least two parts. The first part will be used for “training” the solution. The second portion of the data will be saved for testing the solution’s validity. If the solution can be used to “predict” historical trends correctly, it will likely be viable for predicting the near future as well.

What is involved in training the solution? 

 A multitude of statistical and machine-learning techniques can be applied to the training set to see which method generates the most accurate predictions on the test data. The methods chosen are largely determined by the distribution of the target variable. The target variable is what you are trying to predict.

A host of techniques and criteria are used to determine which technique will work best on the test data. There is a bucketful of acronyms from which a data scientist will choose (e.g. AUC, MAPE and MSE). Sometimes business metrics are more important than statistical metrics for determining the best model. Simplicity and understandability are two other factors the data scientist takes into consideration when choosing a technique.

Modeling is more complex than simply picking a technique. It is an iterative process where successive rounds of testing may cause the data scientist to add or drop features based upon their predictive strengths. Not unlike underwriting and actuarial science, the final result of data modeling is often a combination of art and science.

See also: Competing in an Age of Data Symmetry

What are data scientists looking for when they are testing the solution?

Accuracy is just one of the traits desired in an effective method. If the predictive strength of the model holds up on the test data, then it is a viable solution. If the predictive strength is drastically reduced on the test data set, then the model may be overfitted. In that case, it is time to reexamine the solution and finalize an approach that generates consistently accurate results between the training data and the test data. It is at this stage that a data scientist will often open up their findings to evaluation and scrutiny.

To validate the solution, the data scientist will show multiple models and their results to business analysts and other data scientists, explaining the different techniques that were used to come to the data’s “conclusions.”  The greater team will take many things into consideration and often has great value in making sure that some unintentional issues haven’t crept into the analysis. Are there factors that may have tainted the model? Are the results that the model seems to be generating still relevant to the business objectives they were designed to achieve? After a thorough review, the solution is approved for real testing and future use.

In our final installment, we’ll look at what it means to test and “go live” with a data project, letting the real data flow through the solution to provide real conclusions. We will also discuss how the solution can maintain its value to the organization through monitoring and updating as needed based on changing business dynamics. As a part of our last thoughts, we will also give some examples of how data projects can have a deep impact on the insurers that use them — choosing to operate from a position of analysis and understanding instead of thoughtful conjecture.

The image used with this article first appeared here.

Data Science: Methods Matter (Part 2)

What makes data science a science?


When data analytics crosses the line with simple formulas, much conjecture and an arbitrary methodology behind it, it often fails in what it was designed to do —give accurate answers to pressing questions.

So at Majesco, we pursue a proven data science methodology in an attempt to lower the risk of misapplying data and to improve predictive results. In Methods Matter, Part 1, we provided a picture of the methodology that goes into data science. We discussed CRISP-DM and the opening phase of the life cycle, project design.

In Part 2, we will be discussing the heart of the life cycle — the data itself. To do that, we’ll take an in-depth look at two central steps: building a data set, and exploratory data analysis. These two steps compose the phase that is  extremely critical for project success, and they illustrate why data analytics is more complex than many insurers realize.

Building a Data Set

Building a data set, in one way, is no different than gathering evidence to solve a mystery or a criminal case. The best case will be built with verifiable evidence. The best evidence will be gathered by paying attention to the right clues.

There will also almost never be just one piece of evidence used to build a case, but a complete set of gathered evidence — a data set. It’s the data scientist’s job to ask, “Which data holds the best evidence to prove our case right or wrong?”

Data scientists will survey the client or internal resources for available in-house data, and then discuss obtaining additional external data to complete the data set. This search for external data is more prevalent now than previously. The growth of external data sources and their value to the analytics process has ballooned with an increase in mobile data, images, telematics and sensor availability.

See also: The Science (and Art) of Data, Part 1

A typical data set might include, for example, typical external sources such as credit file data from credit reporting agencies and internal policy and claims data. This type of information is commonly used by actuaries in pricing models and is contained in state filings with insurance regulators. Choosing what features go into the data set is the result of dozens of questions and some close inspection. The task is to find the elements or features of the data set that have real value in answering the questions the insurer needs to answer.

In-house data, for example, might include premiums, number of exposures,    new and renewal policies and more. The external credit data may include information such as number of public records, number of mortgage accounts, number of accounts that are 30+ days past due among others. The goal at this point is to make sure that the data is as clean as possible. A target variable of interest might be something like frequency of claims, severity of claims, or loss ratio. This step is many times performed by in-house resources, insurance data analysts familiar with the organization’s available data, or external consultants such as Majesco.

At all points along the way, the data scientist is reviewing the data source’s suitability and integrity. An experienced analyst will often quickly discern the character and quality of the data by asking themselves, “Does the number of policies look correct for the size of the book of business? Does the average number of exposures per policy look correct? Does the overall loss ratio seem correct? Does the number of new and renewal policies look correct? Are there an unusually high number of missing or unexpected values in the data fields? Is there an apparent reason for something to look out of order? If not, how can the data fields be corrected? If they can’t be corrected, are the data issues so important that these fields should be dropped from the data set? Some whole record observations may clearly contain bad data and should be dropped from the data set. Even further, is the data so problematic that the whole data set should be redesigned or the whole analytics project should be scrapped?


Once the data set has been built, it is time for an in-depth analysis that steps closer toward solution development.

Exploratory Data Analysis

Exploratory data analysis takes the newly minted data set and begins to do something with it — “poking it” with measurements and variables to see how it might stand up in actual use. The data scientist runs preliminary tests on the “evidence.” The data set is subjected to a deeper look at its collective value. If the percentage of missing values is too large, the feature is probably not a good predictor variable and should be excluded from future analysis. In this phase, it may make sense to create more features, including mathematical transformations for non-linear relationships between the features and the target variable.

For non-statisticians, marketing managers and non-analytical staff, the details of exploratory data analysis can be tedious and uninteresting. Yet they are the crux of the genius involved in data science project methodology. Exploratory Data Analysis is where data becomes useful, so it is a part of the process that can’t be left undone. No matter what one thinks of the mechanics of the process, the preliminary questions and findings can be absolutely fascinating.

Questions such as these are common at this stage:

  • Does frequency increase as the number of accounts that are 30+ days past due increases? Is there a trend?
  • Does severity decrease as the number of mortgage trades decreases? Do these trends make sense?
  • Is the number of claims per policy greater for renewal policies than for new policies? Does this finding make sense? If not, is there an error in the way the data was prepared or in the source data itself?
  • If younger drivers have lower loss ratios, should this be investigated as an error in the data or an anomaly in the business? Some trends will not make any sense, and perhaps these features should be dropped from analysis or the data set redesigned.

See also: The Science (and Art) of Data, Part 2

The more we look at data sets, the more we realize that the limits to what can be discovered or uncovered are small and growing smaller. Thinking of relationships between personal behavior and buying patterns or between credit patterns and claims can fuel the interest of everyone in the organization. As the details of the evidence begin to gain clarity, the case also begins to come into focus. An apparent “solution” begins to appear and the data scientist is ready to build that solution.

In Part 3, we’ll look at what is involved in building and testing a data science project solution and how pilots are crucial to confirming project findings.

Data Science: Methods Matter (Part 1)

Why should an insurer employ data science? How does data science differ from any other business analytics that might be happening within the organization? What will it look like to bring data science methodology into the organization?

In nearly every engagement, Majesco’s data science team fields questions as foundational as these, as well as questions related to the details of business needs. Business leaders are smart to do their due diligence — asking IF data science will be valuable to the organization and HOW valuable it might be.

To provide a feel for how data science operates, in this first of three blog posts we will touch briefly on the history of data mining methodology, then look at what an insurer can expect when first engaging in the data science process. Throughout the series, we’re going to keep our eyes on the focus of all of our efforts: answers.


The goal of most data science is to apply the proper analysis to the right sets of data to provide answers. That proper analysis is just as important as the question an insurer is attempting to answer. After all, if we are in pursuit of meaningful business insights, we certainly don’t want to come to the wrong conclusions. There is nothing worse than moving your business full-speed ahead in the wrong direction based upon faulty analysis. Today’s analysis benefits from a thoughtfully constructed data project methodology.

As data mining was on the rise in the 1990s, it became apparent there were a thousand ways a data scientist might pursue answers to business questions. Some of those methods were useful and good, and some were suspect — they couldn’t truly be called methods. To help keep data scientists and their clients from arriving at the wrong conclusions, a methodology needed to be introduced. A defined yet flexible process would not only assist in managing a specific project scope but would also work toward verifying conclusions by building in pre-test and post-project monitoring against expected results. In 1996, the Cross Industry Process for Data Mining (CRISP-DM) was introduced, the first step in the standardization of data mining projects. Though CRISP-DM was a general data project methodology, insurance had its hand in the development. The Dutch insurer OHRA was one of the four sponsoring organizations to co-launch the standardization initiative.

See also: Data Science: Methods Matter

CRISP-DM has proven to be a strong foundation in the world of data science. Even though the number of available data streams has skyrocketed in the last 20 years and the tools and technology of analysis have improved, the overall methodology is still solid. Majesco uses a variance of CRISP-DM, honed over many years of experience in multiple industries.

Pursuing the right questions — Finding the business nugget in the data mine

Before data mining project methodologies were introduced, one issue companies had was a lack of substantial focus on obtainable goals. Projects didn’t always have a success definition that would help the business in the end. Research could be vague, and methods could be transient.

Research needs focus, so the key ingredient in data science methodology is business need. The insurer has a problem it wishes to solve. It has a question that has no readily apparent answer. If an insurer hasn’t used data scientists, this is a frequent point of entry. It is also the one of the greatest differentiators between traditional in-house data analysis and project-based data science methodology. Instead of tracking trends, data science methodology is focused on  finding clear answers to defined questions. Normally these issues are more difficult to solve and represent a greater business risk, making it easy to justify seeking outside assistance.

Project Design — First meeting and first steps

Phase 1 of a data science project life cycle is project design. This phase is about listening and learning about the business problem (or problems) that are ready to be addressed. For example, a P&C insurer might be wondering why loyalty is lowest in the three states where it has the highest claims — Florida, Georgia and Texas. Is this an anomaly, or is there a correlation between the two statistics? A predictive model could be built to predict the likelihood of attrition. The model score could then be used to determine what actions should be taken to reward and keep a good customer, or perhaps what actions could be taken to remove frequent or high-risk claimants from the books.

The insurer must unpack background and pain points. Does the customer have access to all of the data that is needed for analysis? Should the project be segmented in such a way that it provides for detailed analysis at multiple levels? For example, the insurer may need to run the same type of claims analysis across personal auto, commercial vehicle, individual home and business property. These would represent segmented claims models under the same project.

See also: What Comes After Big Data

The insurer must identify assumptions, definitions, possible solutions and a picture of the risks involved for the project, sorting out areas where segmented analysis may be needed. The team must also collect some information to assist in creating a cost-benefit analysis for the project.

As a part of the project design meetings, the company must identify the analytic techniques that will be used and discuss the features the analysis can use. At the end of the project design phase, everyone knows which answers they are seeking and the questions that will be used to frame those answers. They have a clear understanding of the data that is available for their use and have an outline of the full project.

With the clarity to move forward, the insurers move into a closer examination of the data that will be used.

In Part 2, we will look at the two-step data preparation process that is essential to building an effective solution. We will also look at how the proliferation of data sources is supplying insurers with greater analytic opportunities than ever.