July 8, 2016
Data Science: Methods Matter (Part 4)
Putting models into production is the fun part -- but requires not only testing but also a plan for monitoring and updating as time goes on.
Putting a data science solution into production after weeks or months of hard work is undoubtedly the most fun and satisfying part. Models do not exist for their own sakes; they exist to make a positive change in the business. Models that are not in production have not realized their true value. Putting models into production involves not only testing and implementation, but also a plan for monitoring and updating the analytics as time goes on. We’ll walk through these in a moment and see how the methods we employ will allow us to get the maximum benefit from our investment of time and effort.
First, let’s review briefly where we’ve been. In Part 1 of our series on Data Science Methods, we discussed CRISP-DM, a data project methodology that is now in common use across industries. We looked at the reasons insurers pursue data science at the first step, project design. In Part 2, we looked at building a data set and exploratory data analysis. In Part 3, we covered what is involved in building a solution, including setting up the data in the right way to validate the solution.
Now, we are ready for the launch phase. Just like NASA, data scientists need green lights across the board, only launching when they are perfectly ready and when they have addressed virtually every concern.
See also: The Science (and Art) of Data, Part 2
Test and Implement
Once an analytic model has been built and shown to perform well in the lab, it’s time to deploy it into the wild: a real live production environment. Many companies are hesitant to simply flip a switch to move their business processes from one approach to a new one. They prefer to take a more cautious approach and implement a solution in steps or phases. Often, they choose to use either an A/B test and control approach or a phased geographic deployment. In an A/B test approach, the business results of the new analytic solution are compared with the solution that has been used in the past. For example, 50% of the leads in a marketing campaign are allocated to the new approach while 50% are allocated to the old approach, randomly. If the results from the new solution are superior, then it is fully implemented and the old solution removed. Or, if results in one region of the country look promising, then the solution can be rolled out nationwide.
Depending on the computing platform, the code base of the analytic solution may be automatically dropped into existing business processes. Scores may be generated live or in batch, depending on the need. Marketing, for instance, would be a good candidate to receive batch processed results. The data project may have been designed to pre-select good candidates for insurance who are also likely respondents. The results would return an entire prospect group within the data pool.
Live results meet a completely different set of objectives. Giving a broker a real-time indication of our appetite to quote a particular piece of business would be a common use of real-time scoring.
Sometimes, to move a model to production, there’s some coding that needs to happen. This occurs when a model is built and proven in R, but the deployed version of the model has to be implemented in C for performance or platform considerations. The code has to be translated into the new language. Checks must be performed to confirm that variables, final scores and the passing of correct values to end-users are all correct.
Monitor and Update
Some data projects are “one time only.” Once the data has appeared to answer the question, then business strategies can be addressed that will support that answer. Others, however, are designed for long-term use and re-use. These can be very valuable over their periods of use, but special considerations must be taken into account when the plan is to reuse the analytic components of a data project. If a model starts to change over time, you want to manage that change as it happens. Monitoring and updating will help the project hold its business value, as opposed to letting its value decrease as variables and circumstances change. Effective monitoring is insurance for data science models.
For example, a model designed to repeatedly identify “good” candidates for a particular life product may give excellent results at the outset. As the economy changes, or demographics change, credit scoring may exclude good candidates. As health data exchanges improve, new data streams may be better indicators of overall health. Algorithms or data sets may need to be adapted. Minor tweaks may be needed or a whole new project may prove to be the best option if business conditions have drastically changed. Monitoring the intended business results compared with results at the outset and results over time will allow insurers to identify analysis features that no longer provide the most valid results.
See also: Competing in an Age of Data Symmetry
Monitoring is important enough that it goes beyond running periodic reports and having hunches that the models have not lost effectiveness. Monitoring needs its own plan. How often will report(s) run? What are the criteria we can use to validate that the model is still working? Which indicators will tell us that the model is beginning to fail? These criteria are identified by both the data scientists and the business users who are in touch with the business strategy. Depending on the project and previous experience, data scientists may even know intuitively which components within the method are likely to slide out of balance. They can create criteria to monitor those areas more closely.
Updating the model breathes new life into the original work. Depending on what may be happening to the overall solution, the data scientist will know whether a small tweak to a formula is called for or an entirely new solution needs to be built based on new data models. An update saves as much of the original time investment as possible without jeopardizing the results.
Though the methodology may seem complicated, and there seem to be many steps, the results are what matter. Insurance data science continually fuels the business with answers of competitive and operational value. It captures accurate images of reality and allows users to make the best decisions. As data streams grow in availability and use, insurance data science will be poised to make the most of them.