Insurance companies have historically struggled with the challenges posed by claims litigation and the threat of attorney involvement in multiple lines of business. According to the 
Insurance Information Institute, 39 cents of every dollar spent in loss costs in commercial multi-peril went toward defense costs or containment. For medical professional liability, the number increases to 43 cents, and for product liability it is as high as 77 cents. For workers’ compensation (WC), where the employee gives up the right to sue the employer for injuries that happen in the workplace, that number amounts to 
13 cents.
In 2014, the California Workers’ Compensation Institute performed an analysis of attorney involvement in California WC claims. Over the six-year period studied, attorneys were involved in 12% of all claims (including medical-only cases), 38% of lost-time claims and 80% of permanent disability claims. Although the report discussed multiple efforts by lawmakers to reform California’s WC laws to help reduce costs, the report 
noted: “Despite those efforts, the litigation rate has nearly doubled for all workers' compensation claims, and more than tripled for claims involving lost time.”
With such large dollars at risk, it’s no wonder that companies are investing in claims system technology and the use of advanced analytics to help reduce the impact of litigation spend on their bottom line. This article will share how advanced analytics and data mining can be used early in the life cycle of a claim to help identify litigation-prone claims and triage them appropriately.
Setting the Stage
Cases with heavy litigation expenditures typically involve various parties connected in a complex way with differing and sometimes opposing incentives. The ultimate costs of litigation are driven by numerous, factors including the duration of the settlement discussions and trial, if applicable, cost of medical experts, discovery, depositions, attorney fees, responsiveness of the plaintiff attorney, impact of high/low agreements, the appeals process and more.
Therefore, insurance litigation comes with a number of challenges that have historically made it difficult to predict litigation outcomes (e.g. dismiss, defend, settle, alternative dispute resolution, probability of winning, etc.). Traditional approaches have tended to focus on historical reporting and backward-looking data analyses to understand litigation rates, costs, trends, etc. However, such “hindsight”-focused measures are reactive in nature. In many situations, it has been difficult to segment litigation outcomes, especially in the early days of a claim’s lifecycle when an adjuster can make a real difference in the trajectory of a claim. For that reason, a number of innovative insurers have begun shifting to more predictive and forward-looking solutions, including predictive analytics.
See also: Power of ‘Claims Advocacy’  
The Inspiration for Litigation Analytics
Insurance companies have largely been using data analytics to attack claim severity in lines such as WC, medical professional liability, general liability and auto liability bodily injury. By matching claim complexity with the appropriate resource skillset as early as first notice of loss (FNOL), a great deal of efficiencies have been introduced to help reduce claim durations and costs. Claim predictive models have helped insurers better segment and triage high severity workers’ compensation and bodily injury claims, driving up to 10-point reduction in claims spend.
Models focuse on claim severity can naturally be extended to other business areas including medical management, special investigative unit (SIU) referrals and litigation management. We have seen such claim cost models be used by extension in these other areas as more severe claims also tend to be the most complex. For example, the most expensive 10% of bodily injury claims as predicted by these severity models can turn out to be as much as six times more likely to go to litigation and be more expensive to litigate. In WC, the most expensive 10% of claims can turn out to be as much as three times more likely to go to litigation and be even more expensive to litigate. Clearly, there is plenty of segmentation power to be gained – even more so if the models are specifically developed to predict litigation.
Data Used
Data is the first building block of any analytics journey. The ability of actuaries and data scientists to effectively identify litigation-prone claims can be attributed to the power of advanced analytics, the growth of big data and inexpensive computing power and storage. The data used in developing litigation models is similar to that of claim-severity models. They include internal and external third party data, structured and unstructured data, direct pull fields and synthetically created variables. The large number and diversity of the data sources used, sometimes numbering in excess of a thousand potential candidate variables, provide unique information for segmentation and analysis, thus helping to answer the question: which combination of complex patterns seem to make a claim more prone to litigation?
Some of the data factors typically used in litigation models are quite intuitive and include claimant age and gender, accident jurisdiction, claim history, etc. Unstructured data such as the description of the injury and accident narrative are often valuable sources of information that may help to uncover indicators and behavioral clues that bear a strong correlation to future litigation likelihood. Text mining can be used to delve into such unstructured free form data and help identify co-morbidities that significantly drive up claim severity. Additionally, third party data commenting on the individual’s lifestyle and habits add a layer of information about the claimant that further helps to segment the litigation propensity of the claim.
Analytics Techniques Used
A number of modeling techniques can be used to predict the likelihood for a claim to move to litigation. There are a number of techniques that generally perform well if used in a robust end-to-end modeling process that actively involves the end users from day 1. From multivariate predictive modeling and machine learning techniques to neural networks, various methodologies are available to identify the most predictive variables. However, and as we noted in the article titled “
The Challenges of Implementing Advanced Analytics,” it is important to balance building a high precision statistical model with being able to interpret and consume its results. Our experience has shown that it is more valuable to leverage less complex models that are easily interpretable to the end-users than going after highly precise and complex models that are hard to consume and understand.
Models are typically trained on historical data with a defined target variable (i.e. what the model is trying to predict). Example target variables could be a binary 0-1 field (indicating if a claim has indeed moved to litigation “1” or not “0”), litigation dollars explaining how expensive are the claims that are already in litigation, a proxy for each, or a combination of both. Models are also validated on a holdout sample of claims to assess the robustness of the model.
Not surprisingly, models could be built and developed leveraging data available at FNOL or day 1, helping insurers take expedited business actions and make important decisions early in the lifecycle of the claim. As additional data becomes available through time, these models benefit from added information to make their prediction in the weeks and months that follow.
See also: 2 Steps to Transform Claims, Legal Group  
Claims Systems Are Differentiators
With the newest claims systems being implemented, insurance companies are achieving better claim outcomes and spending less on loss adjustment expense. The days of claims systems being only record keeping solutions are passé. The newest technology helps claimants directly verify the status of their claim regardless of the time of day or person’s location, through self-service portals and intuitive websites. But, these capabilities are not just for “external” system users alone. “Internal” system users can now leverage advanced analytics and spend less time on administrative tasks (e.g., manually populating spreadsheets), shifting their focus to working with insureds and improving their claims experience.
Litigation Models in Action
A number of models can be built to identify which claims could be more complex and involve litigation. As an example, an insurance company could build a model that answers the following questions: Of the claims that go to litigation, which ones are likely to be most expensive? If the model returns a high score, it means that the claim has a high likelihood of costing the insurance company a lot of money in litigation expenses. Therefore, it would suggest that the most experienced internal resources and attorneys should be focused on this claim.
Data used and target variables
For the case study at hand, a population of more than 10,000 bodily injury claims spanning multiple accident years was studied. For each claimant, many characteristics and factors about the claim, claimant, accident, injury, suit details (if the claim is litigated) were collected and recorded in a database. External third party data such as the vehicle identification number (VIN) and geo-demographic and behavioral data at the household and census block level were also added to capture more information.
The target variable (i.e. what the model is trying to predict) was calculated as all dollars spent on litigation, including attorney fees and expenses. A predictive model was then built employing a standard train, test, validation methodology.
Model results and output
The resulting models exhibited strong segmentation across the holdout sample. For example, the litigation costs for the highest-scoring 10% of claims were almost double the average population, while the lowest-scoring 10% of claims had litigation costs that were less than half the cost of the average claim. This strong segmentation is even more impressive considering it was realized at day 1, not weeks or months into the life of the claim.
The model contained about 30 predictive variables, some of which were intuitive and readily available (e.g., claimant age and gender, accident location and type – whether parking lot or intersection, etc.). The model also included information sourced from third party vendors (e.g., census employment statistics) and proxies for behavioral factors (e.g., the distance between the accident location and claimant’s residence, lag of time before reporting a claim, etc.). External geo-demographic data about the claimant were also beneficial (e.g., population density in the zip code of residence), in addition to data available from the 
National Highway Traffic Safety Administration (NHTSA) regarding fatal accidents statistics about the accident Zip code, etc.
Bringing Models to Life
Building a predictive model like the one described above is important but only beneficial if the model helps change behaviors, decisions and actions. The insights derived from these models help insurance companies take direct actions on their claim triage strategies, attorney selection and defense strategies. Business rules can be carefully crafted to help claim examiners in their decision-making process. When an adjuster understands that a high-scoring claim has a higher risk of moving to litigation and costing more, defense strategies can be adjusted accordingly. From assignment of external defense counsel, to settle or defend decisions based on case dynamics, insurance companies can alter their event management, resource allocation and escalation decisions earlier in the lifecycle of the claim.
See also: Rethinking the Claims Value Chain  
Carpe Diem With Analytics
The claim insurance landscape is becoming more complex, competitive, fast-moving and disrupted. There is little doubt that the adoption of big data, data science and analytics is important to becoming more agile in this environment, helping insurance companies make better decisions within days of receiving a claim. With the underwriting cycle indicating another period of softening rates, and interest rates hovering at record low levels, tapping savings in litigation spend might just be what the doctor ordered for insurance companies brave enough to seize the opportunity. As Larry Winget said in his book
 It’s Called Work for a Reason, “Knowledge is not power; the implementation of knowledge is power.” The knowledge and analytics exist today to improve litigation costs. We believe the time has come to implement that knowledge.
As used in this document, “Deloitte” means Deloitte Consulting LLP, a subsidiary of Deloitte LLP. Please see www.deloitte.com/us/about for a detailed description of the legal structure of Deloitte LLP and its subsidiaries. Certain services may not be available to attest clients under the rules and regulations of public accounting.
This communication contains general information only, and none of Deloitte Touche Tohmatsu Limited, its member firms, or their related entities (collectively, the “Deloitte Network”) is, by means of this communication, rendering professional advice or services. Before making any decision or taking any action that may affect your finances or your business, you should consult a qualified professional adviser. No entity in the Deloitte Network shall be responsible for any loss whatsoever sustained by any person who relies on this communication.