Tag Archives: life cycle

Data Science: Methods Matter (Part 2)

What makes data science a science?


When data analytics crosses the line with simple formulas, much conjecture and an arbitrary methodology behind it, it often fails in what it was designed to do —give accurate answers to pressing questions.

So at Majesco, we pursue a proven data science methodology in an attempt to lower the risk of misapplying data and to improve predictive results. In Methods Matter, Part 1, we provided a picture of the methodology that goes into data science. We discussed CRISP-DM and the opening phase of the life cycle, project design.

In Part 2, we will be discussing the heart of the life cycle — the data itself. To do that, we’ll take an in-depth look at two central steps: building a data set, and exploratory data analysis. These two steps compose the phase that is  extremely critical for project success, and they illustrate why data analytics is more complex than many insurers realize.

Building a Data Set

Building a data set, in one way, is no different than gathering evidence to solve a mystery or a criminal case. The best case will be built with verifiable evidence. The best evidence will be gathered by paying attention to the right clues.

There will also almost never be just one piece of evidence used to build a case, but a complete set of gathered evidence — a data set. It’s the data scientist’s job to ask, “Which data holds the best evidence to prove our case right or wrong?”

Data scientists will survey the client or internal resources for available in-house data, and then discuss obtaining additional external data to complete the data set. This search for external data is more prevalent now than previously. The growth of external data sources and their value to the analytics process has ballooned with an increase in mobile data, images, telematics and sensor availability.

See also: The Science (and Art) of Data, Part 1

A typical data set might include, for example, typical external sources such as credit file data from credit reporting agencies and internal policy and claims data. This type of information is commonly used by actuaries in pricing models and is contained in state filings with insurance regulators. Choosing what features go into the data set is the result of dozens of questions and some close inspection. The task is to find the elements or features of the data set that have real value in answering the questions the insurer needs to answer.

In-house data, for example, might include premiums, number of exposures,    new and renewal policies and more. The external credit data may include information such as number of public records, number of mortgage accounts, number of accounts that are 30+ days past due among others. The goal at this point is to make sure that the data is as clean as possible. A target variable of interest might be something like frequency of claims, severity of claims, or loss ratio. This step is many times performed by in-house resources, insurance data analysts familiar with the organization’s available data, or external consultants such as Majesco.

At all points along the way, the data scientist is reviewing the data source’s suitability and integrity. An experienced analyst will often quickly discern the character and quality of the data by asking themselves, “Does the number of policies look correct for the size of the book of business? Does the average number of exposures per policy look correct? Does the overall loss ratio seem correct? Does the number of new and renewal policies look correct? Are there an unusually high number of missing or unexpected values in the data fields? Is there an apparent reason for something to look out of order? If not, how can the data fields be corrected? If they can’t be corrected, are the data issues so important that these fields should be dropped from the data set? Some whole record observations may clearly contain bad data and should be dropped from the data set. Even further, is the data so problematic that the whole data set should be redesigned or the whole analytics project should be scrapped?


Once the data set has been built, it is time for an in-depth analysis that steps closer toward solution development.

Exploratory Data Analysis

Exploratory data analysis takes the newly minted data set and begins to do something with it — “poking it” with measurements and variables to see how it might stand up in actual use. The data scientist runs preliminary tests on the “evidence.” The data set is subjected to a deeper look at its collective value. If the percentage of missing values is too large, the feature is probably not a good predictor variable and should be excluded from future analysis. In this phase, it may make sense to create more features, including mathematical transformations for non-linear relationships between the features and the target variable.

For non-statisticians, marketing managers and non-analytical staff, the details of exploratory data analysis can be tedious and uninteresting. Yet they are the crux of the genius involved in data science project methodology. Exploratory Data Analysis is where data becomes useful, so it is a part of the process that can’t be left undone. No matter what one thinks of the mechanics of the process, the preliminary questions and findings can be absolutely fascinating.

Questions such as these are common at this stage:

  • Does frequency increase as the number of accounts that are 30+ days past due increases? Is there a trend?
  • Does severity decrease as the number of mortgage trades decreases? Do these trends make sense?
  • Is the number of claims per policy greater for renewal policies than for new policies? Does this finding make sense? If not, is there an error in the way the data was prepared or in the source data itself?
  • If younger drivers have lower loss ratios, should this be investigated as an error in the data or an anomaly in the business? Some trends will not make any sense, and perhaps these features should be dropped from analysis or the data set redesigned.

See also: The Science (and Art) of Data, Part 2

The more we look at data sets, the more we realize that the limits to what can be discovered or uncovered are small and growing smaller. Thinking of relationships between personal behavior and buying patterns or between credit patterns and claims can fuel the interest of everyone in the organization. As the details of the evidence begin to gain clarity, the case also begins to come into focus. An apparent “solution” begins to appear and the data scientist is ready to build that solution.

In Part 3, we’ll look at what is involved in building and testing a data science project solution and how pilots are crucial to confirming project findings.

Data Science: Methods Matter (Part 1)

Why should an insurer employ data science? How does data science differ from any other business analytics that might be happening within the organization? What will it look like to bring data science methodology into the organization?

In nearly every engagement, Majesco’s data science team fields questions as foundational as these, as well as questions related to the details of business needs. Business leaders are smart to do their due diligence — asking IF data science will be valuable to the organization and HOW valuable it might be.

To provide a feel for how data science operates, in this first of three blog posts we will touch briefly on the history of data mining methodology, then look at what an insurer can expect when first engaging in the data science process. Throughout the series, we’re going to keep our eyes on the focus of all of our efforts: answers.


The goal of most data science is to apply the proper analysis to the right sets of data to provide answers. That proper analysis is just as important as the question an insurer is attempting to answer. After all, if we are in pursuit of meaningful business insights, we certainly don’t want to come to the wrong conclusions. There is nothing worse than moving your business full-speed ahead in the wrong direction based upon faulty analysis. Today’s analysis benefits from a thoughtfully constructed data project methodology.

As data mining was on the rise in the 1990s, it became apparent there were a thousand ways a data scientist might pursue answers to business questions. Some of those methods were useful and good, and some were suspect — they couldn’t truly be called methods. To help keep data scientists and their clients from arriving at the wrong conclusions, a methodology needed to be introduced. A defined yet flexible process would not only assist in managing a specific project scope but would also work toward verifying conclusions by building in pre-test and post-project monitoring against expected results. In 1996, the Cross Industry Process for Data Mining (CRISP-DM) was introduced, the first step in the standardization of data mining projects. Though CRISP-DM was a general data project methodology, insurance had its hand in the development. The Dutch insurer OHRA was one of the four sponsoring organizations to co-launch the standardization initiative.

See also: Data Science: Methods Matter

CRISP-DM has proven to be a strong foundation in the world of data science. Even though the number of available data streams has skyrocketed in the last 20 years and the tools and technology of analysis have improved, the overall methodology is still solid. Majesco uses a variance of CRISP-DM, honed over many years of experience in multiple industries.

Pursuing the right questions — Finding the business nugget in the data mine

Before data mining project methodologies were introduced, one issue companies had was a lack of substantial focus on obtainable goals. Projects didn’t always have a success definition that would help the business in the end. Research could be vague, and methods could be transient.

Research needs focus, so the key ingredient in data science methodology is business need. The insurer has a problem it wishes to solve. It has a question that has no readily apparent answer. If an insurer hasn’t used data scientists, this is a frequent point of entry. It is also the one of the greatest differentiators between traditional in-house data analysis and project-based data science methodology. Instead of tracking trends, data science methodology is focused on  finding clear answers to defined questions. Normally these issues are more difficult to solve and represent a greater business risk, making it easy to justify seeking outside assistance.

Project Design — First meeting and first steps

Phase 1 of a data science project life cycle is project design. This phase is about listening and learning about the business problem (or problems) that are ready to be addressed. For example, a P&C insurer might be wondering why loyalty is lowest in the three states where it has the highest claims — Florida, Georgia and Texas. Is this an anomaly, or is there a correlation between the two statistics? A predictive model could be built to predict the likelihood of attrition. The model score could then be used to determine what actions should be taken to reward and keep a good customer, or perhaps what actions could be taken to remove frequent or high-risk claimants from the books.

The insurer must unpack background and pain points. Does the customer have access to all of the data that is needed for analysis? Should the project be segmented in such a way that it provides for detailed analysis at multiple levels? For example, the insurer may need to run the same type of claims analysis across personal auto, commercial vehicle, individual home and business property. These would represent segmented claims models under the same project.

See also: What Comes After Big Data

The insurer must identify assumptions, definitions, possible solutions and a picture of the risks involved for the project, sorting out areas where segmented analysis may be needed. The team must also collect some information to assist in creating a cost-benefit analysis for the project.

As a part of the project design meetings, the company must identify the analytic techniques that will be used and discuss the features the analysis can use. At the end of the project design phase, everyone knows which answers they are seeking and the questions that will be used to frame those answers. They have a clear understanding of the data that is available for their use and have an outline of the full project.

With the clarity to move forward, the insurers move into a closer examination of the data that will be used.

In Part 2, we will look at the two-step data preparation process that is essential to building an effective solution. We will also look at how the proliferation of data sources is supplying insurers with greater analytic opportunities than ever.

Pursue Innovation or Transformation?

People often use “innovation” and “transformation” synonymously. Both words evoke thoughts of change and modernization. Both words are important outcomes of the change management life cycle. We know innovation is occurring all around us, as people find ways to improve things. Alternately, transformation is the result of moving from one state to another.

So, should we innovate, or should we transform? Should we do both? What’s the difference?

See Also: Does Your Culture Embrace Innovation?

I think the best way to answer is to explore the differences between innovation and transformation in insurance and break each down into definitions and examples. Then, you can determine where your company is in the change management life cycle and decide what will work best for your organization.

What SMA has found, as we work with insurers, is that innovation and transformation are not only distinct, but are decidedly different, and their impact on organizations is different, too.


Innovation is defined by SMA as rethinking, reimagining and reinventing the business of insurance. We have seen the results of innovation in business models, customer relationships, new products and new services and in how investments in technology are made. A perfect example of innovation is the focused improvements of customer experience.

Once, customers had to push paper and make phones calls back and forth with agents, but innovation has made these types of interactions a distant memory. Today, chat, apps, portals, mobile, etc. have started to create an innovated customer experience. It is different and (arguably) better. Similarly, in the data analytics arena, innovation in the ways we use and apply data has changed the way insurers operate, price policies, handle claims and compete in the market. Innovation makes something that once seemed impossible possible. It is rethinking the way of doing things, questioning the possibilities and turning them into action.


Transformation is the evolution or journey from a current level to a different and better state. SMA describes it as modernizing and optimizing. Like innovation, transformations produce an improved state, but we can also measure the journey as an outcome of the process. The journey of transformation is the tangible process, structure or building block for future success, even if a project fails or takes a different shape. Transformation can be seen in core system replacements, during which existing and necessary underwriting, billing and claims capabilities are shifted and moved into a better state through improved technology and processes. Transformations are evolutionary and occur over time.

Our recent research reveals that approximately 13% of insurers self-identify as innovating, while 45% identify as transforming. That’s a pretty sizable difference. The data suggests that transforming existing systems and processes is a necessary effort in today’s world. SMA predicts the percentages of both innovating and transforming insurers will continue to grow.

Most insurers need both!