May 26, 2016
Data Science: Methods Matter (Part 1)
There is nothing worse than moving your business full-speed ahead in the wrong direction based on faulty analysis.
Why should an insurer employ data science? How does data science differ from any other business analytics that might be happening within the organization? What will it look like to bring data science methodology into the organization?
In nearly every engagement, Majesco’s data science team fields questions as foundational as these, as well as questions related to the details of business needs. Business leaders are smart to do their due diligence — asking IF data science will be valuable to the organization and HOW valuable it might be.
To provide a feel for how data science operates, in this first of three blog posts we will touch briefly on the history of data mining methodology, then look at what an insurer can expect when first engaging in the data science process. Throughout the series, we’re going to keep our eyes on the focus of all of our efforts: answers.
The goal of most data science is to apply the proper analysis to the right sets of data to provide answers. That proper analysis is just as important as the question an insurer is attempting to answer. After all, if we are in pursuit of meaningful business insights, we certainly don’t want to come to the wrong conclusions. There is nothing worse than moving your business full-speed ahead in the wrong direction based upon faulty analysis. Today’s analysis benefits from a thoughtfully constructed data project methodology.
As data mining was on the rise in the 1990s, it became apparent there were a thousand ways a data scientist might pursue answers to business questions. Some of those methods were useful and good, and some were suspect — they couldn’t truly be called methods. To help keep data scientists and their clients from arriving at the wrong conclusions, a methodology needed to be introduced. A defined yet flexible process would not only assist in managing a specific project scope but would also work toward verifying conclusions by building in pre-test and post-project monitoring against expected results. In 1996, the Cross Industry Process for Data Mining (CRISP-DM) was introduced, the first step in the standardization of data mining projects. Though CRISP-DM was a general data project methodology, insurance had its hand in the development. The Dutch insurer OHRA was one of the four sponsoring organizations to co-launch the standardization initiative.
See also: Data Science: Methods Matter
CRISP-DM has proven to be a strong foundation in the world of data science. Even though the number of available data streams has skyrocketed in the last 20 years and the tools and technology of analysis have improved, the overall methodology is still solid. Majesco uses a variance of CRISP-DM, honed over many years of experience in multiple industries.
Pursuing the right questions — Finding the business nugget in the data mine
Before data mining project methodologies were introduced, one issue companies had was a lack of substantial focus on obtainable goals. Projects didn’t always have a success definition that would help the business in the end. Research could be vague, and methods could be transient.
Research needs focus, so the key ingredient in data science methodology is business need. The insurer has a problem it wishes to solve. It has a question that has no readily apparent answer. If an insurer hasn’t used data scientists, this is a frequent point of entry. It is also the one of the greatest differentiators between traditional in-house data analysis and project-based data science methodology. Instead of tracking trends, data science methodology is focused on finding clear answers to defined questions. Normally these issues are more difficult to solve and represent a greater business risk, making it easy to justify seeking outside assistance.
Project Design — First meeting and first steps
Phase 1 of a data science project life cycle is project design. This phase is about listening and learning about the business problem (or problems) that are ready to be addressed. For example, a P&C insurer might be wondering why loyalty is lowest in the three states where it has the highest claims — Florida, Georgia and Texas. Is this an anomaly, or is there a correlation between the two statistics? A predictive model could be built to predict the likelihood of attrition. The model score could then be used to determine what actions should be taken to reward and keep a good customer, or perhaps what actions could be taken to remove frequent or high-risk claimants from the books.
The insurer must unpack background and pain points. Does the customer have access to all of the data that is needed for analysis? Should the project be segmented in such a way that it provides for detailed analysis at multiple levels? For example, the insurer may need to run the same type of claims analysis across personal auto, commercial vehicle, individual home and business property. These would represent segmented claims models under the same project.
See also: What Comes After Big Data
The insurer must identify assumptions, definitions, possible solutions and a picture of the risks involved for the project, sorting out areas where segmented analysis may be needed. The team must also collect some information to assist in creating a cost-benefit analysis for the project.
As a part of the project design meetings, the company must identify the analytic techniques that will be used and discuss the features the analysis can use. At the end of the project design phase, everyone knows which answers they are seeking and the questions that will be used to frame those answers. They have a clear understanding of the data that is available for their use and have an outline of the full project.
With the clarity to move forward, the insurers move into a closer examination of the data that will be used.
In Part 2, we will look at the two-step data preparation process that is essential to building an effective solution. We will also look at how the proliferation of data sources is supplying insurers with greater analytic opportunities than ever.