February 10, 2015
What Comes After Predictive Analytics
by Anand Rao
Predictive analytics can be helpful but has two clear problems. Prescriptive analytics and complexity science provide the next leap forward.
Historically, “analytics” has referred to the use of statistical or data mining techniques to analyze data and make inferences. In this context, analytics typically explain what happened (descriptive analytics) and why (diagnostic analytics). If an insurer saw its customers moving to its competition, it would analyze the characteristics of the customers staying or leaving, the prices it and its competitors offer and customer satisfaction. The analysis would help determine what was happening, who was leaving and why. In contrast, predictive analytics focuses on what will happen in the future.
“Predictive analytics” has a fairly broad definition in the press but has a specific meaning in academic circles. Classical predictive analytics focuses on building predictive models where a subset of the available data is used to build a model using statistical techniques (usually some form of regression analysis — linear, logistic regression etc.) that is then tested for its accuracy with the “holdout” sample. Once a model with sufficient accuracy is developed, it can be used to predict future outcomes. More recent predictive analytics techniques use additional machine learning techniques (e.g., neural network analysis or Bayesian probabilistic techniques).
Insurers have used predictive analytics for almost two decades, but, despite its usefulness, it has two main drawbacks:
- Focus on decision versus action: Predictive analytics can tell you what is likely to happen but cannot make recommendations and act on your behalf. For example, a predictive model on the spread of flu can determine the prevalence and spread of flu but cannot tell you how to avoid it. Similarly, a predictive model of insurance sales can determine weekly sales numbers but is incapable of suggesting how to increase them.
- Reliance on single future versus multiple alternative futures: While we can learn from the past, we know that it may not be a good predictor of the future. Predictive models make linear predictions based on past data. They also make certain assumptions that may not be viable when extrapolating into the future. For example, regression requires the designation of a dependent variable (e.g., insurance sales), which is then described in terms of other independent variables (e.g., brand loyalty, price etc.). While this method can help predict future insurance sales, the accuracy of the numbers tends to decrease further into the future, where broad macro-economic and behavioral considerations will play a greater role in sales.
In response, there are a number of firms, authors and articles that propose “prescriptive analytics” as the next stage of the analytics continuum’s evolution. Prescriptive analytics automates the recommendation and action process and generally is based on machine learning techniques that evaluate the impact of future decisions and adjust model parameters based on the difference between predicted and actual outcomes. For example, insurers could use prescriptive analytics for automatically underwriting insurance, where the system improves its conversion ratio by adjusting price and coverage on a continual basis based on predicted take-up and actual deviations from it.
However, while prescriptive analytics does address the first of predictive analytics’ drawbacks by making and acting on its recommendations, it usually fails to address the second shortcoming. Prescriptive analytics relies on a single view of the future based on historical data and does not allow for “what if” modeling of multiple future scenarios. The critical assumption is that the variables used to explain the dependent variable are independent of each other, which in most cases is not true. While the analysis can be modified to account for this collinearity, the techniques still fail to use all of the available data from domain experts. In particular, prescriptive analytics does not take into account the rich structure and influences among all the variables being modeled.
In addition to prescriptive analytics, we believe that complexity science is a natural extension of predictive analytics. Complexity science is an inter-disciplinary approach to understanding complex systems, including how they form, evolve and cease to exist. Typically, a system that consists of a few well-known parts that consistently interact with each other in a way we can easily understand is a “simple” system. For example, a thermostat that can read (or sense) the temperature and reach a given target temperature is a simple system. At the other end of the spectrum, a system with a very large collection of entities that interact randomly with each other is a “random” system. We often use statistical techniques to understand the behavior of the latter. For example, we can gain an understanding of the properties of a liquid (like its boiling point) by looking at the average properties of the elements and compounds that compose it. The fundamental assumption about such systems is that its parts are independent.
In between simple and random systems are “complex” systems that consist of several things that interact with each other in meaningful ways that change their future path. For example, a collection of consumers watching advertisements, talking to others and using products can influence other consumers, companies and the economy as a whole. Complexity science rejects the notion of “independence” and actively models the interactions of entities that make up the system.
Complexity science identifies seven core traits of entities and how they relate to each other: 1) information processing, 2) non-linear relationships, 3) emergence, 4) evolution, 5) self-organization, 6) robustness and 7) if they are on the edge of chaos. Unlike a random system, the entities in a complex system process information and make decisions. These information processing units influence each other, which results in positive or negative feedback leading to non-linear relationships. As a result, properties emerge from the interaction of the entities that did not originally characterize the individual entities. For example, when a new product comes on the market, consumers may purchase it not just because of its intrinsic value but also because of its real or perceived influence on others. Moreover, the interactions between entities in a complex system are not static; they evolve over time. They are capable of self-organizing and lack a central controlling entity. These conditions lead to more adaptive behavior. Such systems are often at the edge of chaos but are not quite chaotic or entirely random.
Two parallel developments have led to complexity science’s increased use in practical applications in recent years. The first is the availability of large amounts of data (or big data) that allows us to capture the properties of interest within each entity and the interactions between them. Processing the data allows us to model each entity and its interactions with others individually, as opposed to treating them as an aggregate. For example, a social network is a complex system of interacting individuals. We can use complexity science to understand how ideas flow through the social network, how they become amplified and how they fade away.
The second development accelerating complexity science’s use is the inadequacy of classical or statistical models to adequately capture complexity in the global economy. Since the financial crisis of 2007/8, a number of industry bodies, academics and regulators have called for alternative ways of looking at the world’s complex social and financial systems. For example, the Society of Actuaries has published a number of studies using complexity science and a specific type of complexity science called agent-based modeling to better understand policyholder behavior. In addition, health insurers are building sophisticated models of human physiology and chemical reactions to test adverse drug interactions. As another example, manufacturers are modeling global supply chains as complex interacting entities to increase their robustness and resiliency.
Agent-based modeling is a branch of complexity science where the behavior of a system is analyzed using a collection of interacting, decision-making entities called agents (or software agents). The individual behavior of each agent is modeled based on available data and domain knowledge. The interaction of these agents among themselves and the external environment can lead to market behavior that is more than just the aggregate of all the individual behaviors. This often leads to emergent properties. Such models can be used to evaluate multiple scenarios into the future to understand what will happen or what should happen as a result of a certain action. For example, a large annuity provider has used individual policyholder data to create an agent-based model in which each one of its customers is modeled as an individual software agent. Based on specific policyholder data, external socio-demographic and behavioral data, as well as historical macro-economic data, the annuity provider can evaluate multiple scenarios on how each annuity policyholder will lapse, withdraw or annuitize their policy under different economic conditions.
In conclusion, as companies look to capitalize on big data opportunities, we will see more of them adopt prescriptive analytics and complexity science to predict not just what is likely to happen based on past events but also how they can change the future course of events given certain economic, political and competitive constraints.