Tag Archives: methodology

Data Science: Methods Matter (Part 4)

Putting a data science solution into production after weeks or months of hard work is undoubtedly the most fun and satisfying part. Models do not exist for their own sakes; they exist to make a positive change in the business. Models that are not in production have not realized their true value. Putting models into production involves not only testing and implementation, but also a plan for monitoring and updating the analytics as time goes on. We’ll walk through these in a moment and see how the methods we employ will allow us to get the maximum benefit from our investment of time and effort.

First, let’s review briefly where we’ve been. In Part 1 of our series on Data Science Methods, we discussed CRISP-DM, a data project methodology that is now in common use across industries. We looked at the reasons insurers pursue data science at the first step, project design. In Part 2, we looked at building a data set and exploratory data analysis. In Part 3, we covered what is involved in building a solution, including setting up the data in the right way to validate the solution.

Now, we are ready for the launch phase. Just like NASA, data scientists need green lights across the board, only launching when they are perfectly ready and when they have addressed virtually every concern.

See also: The Science (and Art) of Data, Part 2  

Test and Implement

Once an analytic model has been built and shown to perform well in the lab, it’s time to deploy it into the wild: a real live production environment. Many companies are hesitant to simply flip a switch to move their business processes from one approach to a new one. They prefer to take a more cautious approach and implement a solution in steps or phases. Often, they choose to use either an A/B test and control approach or a phased geographic deployment. In an A/B test approach, the business results of the new analytic solution are compared with the solution that has been used in the past. For example, 50% of the leads in a marketing campaign are allocated to the new approach while 50% are allocated to the old approach, randomly. If the results from the new solution are superior, then it is fully implemented and the old solution removed. Or, if results in one region of the country look promising, then the solution can be rolled out nationwide.

Depending on the computing platform, the code base of the analytic solution may be automatically dropped into existing business processes. Scores may be generated live or in batch, depending on the need. Marketing, for instance, would be a good candidate to receive batch processed results. The data project may have been designed to pre-select good candidates for insurance who are also likely respondents. The results would return an entire prospect group within the data pool.

Live results meet a completely different set of objectives. Giving a broker a real-time indication of our appetite to quote a particular piece of business would be a common use of real-time scoring.

Sometimes, to move a model to production, there’s some coding that needs to happen. This occurs when a model is built and proven in R, but the deployed version of the model has to be implemented in C for performance or platform considerations. The code has to be translated into the new language. Checks must be performed to confirm that variables, final scores and the passing of correct values to end-users are all correct.

 Monitor and Update

Some data projects are “one time only.” Once the data has appeared to answer the question, then business strategies can be addressed that will support that answer. Others, however, are designed for long-term use and re-use. These can be very valuable over their periods of use, but special considerations must be taken into account when the plan is to reuse the analytic components of a data project. If a model starts to change over time, you want to manage that change as it happens. Monitoring and updating will help the project hold its business value, as opposed to letting its value decrease as variables and circumstances change. Effective monitoring is insurance for data science models.

For example, a model designed to repeatedly identify “good” candidates for a particular life product may give excellent results at the outset. As the economy changes, or demographics change, credit scoring may exclude good candidates. As health data exchanges improve, new data streams may be better indicators of overall health. Algorithms or data sets may need to be adapted. Minor tweaks may be needed or a whole new project may prove to be the best option if business conditions have drastically changed. Monitoring the intended business results compared with results at the outset and results over time will allow insurers to identify analysis features that no longer provide the most valid results.

See also: Competing in an Age of Data Symmetry

Monitoring is important enough that it goes beyond running periodic reports and having hunches that the models have not lost effectiveness. Monitoring needs its own plan. How often will report(s) run? What are the criteria we can use to validate that the model is still working? Which indicators will tell us that the model is beginning to fail? These criteria are identified by both the data scientists and the business users who are in touch with the business strategy. Depending on the project and previous experience, data scientists may even know intuitively which components within the method are likely to slide out of balance. They can create criteria to monitor those areas more closely.

Updating the model breathes new life into the original work. Depending on what may be happening to the overall solution, the data scientist will know whether a small tweak to a formula is called for or an entirely new solution needs to be built based on new data models. An update saves as much of the original time investment as possible without jeopardizing the results.

Though the methodology may seem complicated, and there seem to be many steps, the results are what matter. Insurance data science continually fuels the business with answers of competitive and operational value. It captures accurate images of reality and allows users to make the best decisions. As data streams grow in availability and use, insurance data science will be poised to make the most of them.

7 Ways Your Data Can Hurt You

Your data could be your most valuable asset, and participants in the workers’ compensation industry have loads available because they have been collecting and storing data for decades. Yet few analyze data to improve processes and outcomes or to take action in a timely way.

Analytics (data analysis) is crucial to all businesses today to gain insights into product and service quality and business profitability, and to measure value contributed. But processes need to be examined regarding how data is collected, analyzed and reported. Begin by examining these seven ways data can hurt or help.

1. Data silos

Data silos are common in workers’ compensation. Individual data sets are used within organizations and by their vendors to document claim activity. Without interoperability (the ability of a system to work with other systems without special effort on the part of the user) or data integration, the silos naturally fragment the data, making it difficult to gain full understanding of the claim and its multiple issues. A comprehensive view of a claim includes all its associated data.

2. Unstructured data

Unstructured documentation, in the form of notes, leaves valuable information on the table. Notes sections of systems contain important information that cannot be readily integrated into the business intelligence. The cure is to incorporate data elements such as drop-down lists to describe events, facts and actions taken. Such data elements provide claim knowledge and can be monitored and measured.

3. Errors and omissions

Manual data entry is tedious work and often results in skipped data fields and erroneous content. When users are unsure of what should be entered into a data field, they might make up the input or simply skip the task. Management has a responsibility to hold data entry people accountable for what they add to the system. It matters.

Errors and omissions can also occur when data is extracted by an OCR methodology. Optical character recognition is the recognition of printed or written text characters by a computer. Interpretation should be reviewed regularly for accuracy and to be sure the entire scope of content is being retrieved and added to the data set. Changing business needs may result in new data requirements.

4. Human factors

Other human factors also affect data quality. One is intimidation by IT (information technology). Usually this is not intended, but remember that people in IT are not claims adjusters or case managers. The things of interest and concern to them can be completely different, and they use different language to describe those things.

People in business units often have difficulty describing to IT what they need or want. When IT says a request will be difficult or time-consuming, the best response is to persist.

5. Timeliness

There needs to be timely appropriate reporting of critical information found in current data. The data can often reveal important facts that can be reported automatically and acted upon quickly to minimize damage. Systems should be used to continually monitor the data and report, thereby gaining workflow efficiencies. Time is of the essence.

6. Data fraud

Fraud finds its way into workers’ compensation in many ways, even into its data. The most common data fraud is found in billing—overbilling, misrepresenting diagnoses to justify procedures and duplicate billing are a few of the methods. Bill review companies endeavor to uncover these hoaxes.

Another, less obvious means of fraud is through confusion. A provider may use multiple tax IDs or NPIs (national provider numbers) to obscure the fact that a whole set of bills are coming from the same individual or group. The system will consider the multiple identities as different and not capture the culprit. Providers can achieve the same result by using different names and addresses on bills. Analysis of provider performance is made difficult or impossible when the provider cannot be accurately identified.

7. Data as a work-in-process tool

Data can be used as a work-in-process tool for decision support, workflow analysis, quality measurement and cost assessment, among other initiatives. Timely, actionable information can be applied to work flow and to services to optimize quality performance and cost control.

Accurate and efficient claims data management is critical to quality, outcome and cost management. When data accuracy and integrity is overlooked as an important management responsibility, it will hurt the organization.

Integrating Strategy, Risk and Performance

While many (including me) talk about the need for integrating the setting and execution of strategy, the management of risk, decision-making and performance monitoring, reporting and management, there isn’t a great deal of useful guidance on how to do it well.

A recent article in CGMA Magazine, 8 Best Practices for Aligning Strategy, Planning and Risk, describes a methodology used by Mass Mutual that it calls the “Pinwheel.”

There are a number of points in the article that I like:

  • “Success in business is influenced by many factors: effective strategy and execution; deep understanding of the business environment, including its risks; the ability to innovate and adapt; and the ability to align strategy throughout the organization.”
  • “The CEO gathers senior corporate and business unit leaders off-site three times a year. As well as fostering transparency, teamwork and alignment, this ensures that the resulting information reaches the board of directors in time for its meetings….The result: The leadership team is more engaged in what the company’s businesses are doing, not just divisional priorities. This makes them more collaborative and informed leaders. This helps foster a more unified brand and culture across the organization.”
  • “A sound understanding of global business conditions and trends is fundamental to effective governance and planning.”
    Comment: Understanding the external context is critical if optimal objectives and strategies are to be set, with an adequate understanding of the risks inherent in each strategy and the relative merits of every option.
  • “Strategy and planning is a dynamic process, and disruptive innovation is essential for cultural change and strategic agility. Management and the board must continually consider new initiatives that may contribute to achieving the organization’s long-term vision and aspirations.”
  • Key risk indicators are established for strategies, plans, projects and so on.
  • “Evaluation and monitoring to manage risks and the overall impact on the organization is an ongoing process….Monitoring is a continuous, multi-layered process. In addition to quarterly monitoring of progress against the three-year operating plan and one-year budget, the company has initiated bottom-up ‘huddle boards’ that provide critical information across all levels of the organization.”
  • “Effective governance requires a tailored information strategy for the executive leadership team and the board of directors…. This should include: essential information needed to monitor and evaluate strategic execution of the organization; risks to the achievement of long-term objectives; and risks related to conforming to compliance and reporting requirements.”
  • “Integrating the ERM, FP&A and budget functions can help to manage risks effectively and to allocate limited capital more quickly and efficiently.”

I am not familiar with the company and its methodology, but based on the limited information in the article I think there are some areas for improvement:

1. Rather than selecting strategies and objectives and only then considering risk, the consideration of risk should be a critical element in the strategy-selection process.

2. The article talks about providing performance and risk information separately to the corporate development and risk functions. Surely, this should be integrated and used primarily by operating management to adjust course as needed.

3. I am always nervous when the CFO and his team set the budget and there is no mention of how operating management participates in the process. However, it is interesting that the risk function at Mass Mutual is involved.

What do you think? I welcome your comments.