Tag Archives: data science

Leveraging Data Science for Impact

As insurers strive to become more relevant to their customers and more efficient, they have embraced the strategic importance of their data. Insurance companies have been using various data streams to predict property damage and loss for generations. But while they have been collecting increasingly large stockpiles of consumer data, until recently they have lacked the tools and talent to operationalize it — particularly with the level of transparency required by regulatory bodies — to drive better products and services and operational efficiencies.

Advances in AI and machine learning have enabled insurers to improve the customer experience and boost policyholder retention while cutting claims handling time and costs, eliminating fraud and protecting against cybercrime. These new tools and platforms have generated increased interest in using data science across the industry, and insurance companies have been investing accordingly.

According to a recent study, 27% of large life/annuities insurers and 35% of large property/casualty insurers are expanding their data science efforts to some degree, while 13% of large life/annuity insurers are piloting an initiative. Midsize insurers are similarly active in the space, with 20% of life/annuity carriers and 24% of property/casualty carriers looking to expand their data science efforts.

See also: Turning Data Into Action  

But while investments in AI are growing, insurance organizations are often finding that their existing analytics and business intelligence technology and talent aren’t capable of meeting their current and expanding needs. Challenges in resources, technology infrastructure and the ability to operationalize models quickly and efficiently can prevent insurers from fully leveraging AI and data science to drive business impact. To overcome these challenges, and maximize the ROI on AI investments, insurance companies must look to innovative solutions such as data science automation.

While data science is becoming a valuable tool in the insurance industry, implementing a data science program is not easy. A typical enterprise data science project is highly complex and requires the deployment of an interdisciplinary team that involves assembling data engineers, developers, data scientists, subject matter experts and individuals with other special skills and knowledge. This talent is scarce and costly. This is neither scalable nor sustainable for most insurance organizations.

Data science automation platforms fully automate the data science process, including data preparation, feature engineering, machine learning and the production of data science pipelines – enabling insurance organizations to execute more business initiatives while maintaining the current investments and resources. Data science automation allows data scientists to focus on what to solve rather than how to solve. End-to-end data science automation makes it possible to execute data science processes faster, often in days instead of months, with unprecedented levels of transparency and accountability. As a result, insurance organizations can rapidly scale their AI/ML initiatives to drive transformative business changes.

There are several key areas where data science automation can make a big impact in the insurance industry. For increasing operational efficiency, AI-based automatic underwriting and claims management will be a major trend that we will see in coming years. In customer relationship management, AI will be used more frequently to help profile customer behaviors, helping insurers to get a better and deeper understanding of their customers’ wants and needs. This, in turn, will help to drive revenue growth.

See also: Role of Unstructured Data in AI  

In the near future, data science and AI will be widely implemented in the insurance industry, and the barrier to adoption for data science and AI will become low. Once this happens, accumulated critical use cases will be key differentiators for insurance companies implementing these technologies. Data science automation accelerates the data science process, enabling insurers to explore 10X more use cases than with the traditional method of data science. Early adopters have already started to leverage automation to scale their data science initiatives.

3 Steps to Demystify Artificial Intelligence

Artificial intelligence is the new electricity. We hear it will fundamentally shift the balance of power between labor and capital, mostly by rendering labor obsolete. It will enable and empower transformative technologies that will rearrange the sociopolitical landscape and may lead to humanity’s transcendence (or extinction) within our lifetimes. As it changes the world, it will necessarily rewrite the rules of insurance. That’s the myth, and the nature of the headlines.

Interestingly, insurance is heavy on intellectual property (think of proprietary underwriting models), technology and data. And AI is hungry; hungry for data, of course, but also hungry for systems that can be automated and for proprietary classification problems that can be improved. That places insurance right in the appetite of artificial intelligence and its promise of transformation. If we want to act on artificial intelligence’s transformational potential,  we need to understand what it actually is, separate the technologies from the hype and develop a practical understanding of what is required to implement AI-powered solutions in the insurance sector. This article will highlight these three steps and offers a realistic approach for carriers to take advantage of the opportunities.

Defining Artificial Intelligence

Unfortunately, our first step is also our hardest, as a working definition of artificial intelligence is difficult. The scope of the term AI is broad, and it requires careful consideration to avoid becoming hopelessly confounded with its own hype. It is also challenging to come to a clear definition of natural intelligence, which leaves us struggling for a definition of artificial intelligence because the latter is so often compared to the former.

AI tends to be discussed in two flavors. The first is general artificial intelligence (also, artificial general intelligence and strong AI). GAI is machinery capable of human-level cognition, including a general problem-solving capability that is potentially self-directed and broadly applicable to many kinds of problems. GAI references are accessible through fictional works, such as C-3PO in Star Wars or Disney’s eponymous WALL-E. The most important feature of GAI is that it does not currently exist, and there is deep debate about its potential to ever exist.

The second is usually referred to as narrow AI. Narrow AI is task-specific and non-generalizable. Examples include facial recognition on Apple’s iPhone X and speech-to-text transliteration by Amazon’s Alexa. Narrow  AI looks and feels a lot like software or, perhaps, predictive models. Narrow AI can be described as a class of modeling techniques that fall under the category of machine learning.

See also: Seriously? Artificial Intelligence?  

What is machine learning? Imagine a set of input data; this data has one or more potential features of interest. Machine learning is a technique for mapping the features of input data to a useful output. It is characterized by statistical inference, as advanced techniques often underlie machine learning predictive models. Through statistical modeling, software can infer a likely output given a set of input features. The predictive accuracy of machine learning methods increase as their training data sets increase in size. As the machine ingests more data, it is said to learn from that data. Hence, machine learning.

Perhaps most important of all, machine learning (as an implementation of narrow AI) is real and here today; for the remainder of our discussion when we say “AI,” we mean narrow AI or machine learning.

Beyond the Hype

The hype around AI and its potential is extensive. Silicon Valley billionaires opine on the potential implications of the technology, including comparing its power to nuclear weapons. Articles endlessly debate if and how quickly AI will structurally unemploy vast swaths of white collar workers. MIT’s Technology Review provides a nice summary of the literature, stating that up to half of all jobs worldwide could be eliminated in the next few decades.

AI may well have this kind of impact. And the social, political and economic implications of that impact, especially around questions of potential large-scale unemployment, deserve careful long-term consideration. However, executives and business owners need to evaluate technology investments today to improve their current competitive position. From that perspective, we find it more practical to focus on examining which existing tasks could be automated by AI today.

Enter Pigeons

In 2012, researchers trained pigeons to recognize people based only on their faces as part of a study on cognition. Suppose you had millions of face-recognizing pigeons; this force of labor could be deployed in a comprehensive facial recognition system — a system remarkably similar in function to the facial recognition AI of devices like modern smart phones. It turns out pigeons have also been trained to recognize voicesspot cancers on X-rays and count, among a host of other tasks related to headline-grabbing AI achievements.

The metaphor is admittedly silly. Instead of pigeons, imagine an army of virtual robots capable of classifying information from the real world to produce a machine-readable data set. In machine learning language, these robots take unstructured data and make it structured. Said robots resemble the automation machinery of a factory; like spot welders tirelessly joining steel members to form automobile frames, our virtual robots tirelessly recognize if a face is featured in a photograph. In contemplating the question, what could be automated with AI, a useful starting place is the army of robots (or pigeons!). For example:

  • What existing analyses could be improved or optimized? Could pricing or underwriting be improved using better classifiers or non-linear modeling approaches?
  • What data currently exist at the firm that could be made available for new types of analysis? Claims adjusters’ notes can be processed by natural language algorithms and cross-referenced with photos of physical damage or prior inspections.
  • What data would you analyze if it could be made available? What if you could listen to all the policyholder calls received by your customer service department and annotate which questions stumped the customer service representatives? Or which responses lead to irritation in the policyholders’ voices?

Bringing AI to Insurance

What is an insurer to do? Start by not fretting. We propose two considerations to facilitate a sleep-at-night perspective. First, insurers are already good at AI or its precursor technologies. The applicability of AI in the present and near future is entirely based on narrow AI technologies. For example, natural language processing and image recognition are both machine learning implementations with working business applications right now. Both use predictive models to achieve results. The software may be artificial neural networks trained on vast data sets, but they are nonetheless conceptually compatible with things insurance carriers have used for years, like actuarial pricing models. The point is that the application of AI is an incremental step forward in the types of models and data already applied in the business.

Second, sorting through the hype requires a staple of good business decision making: the risk-cost-benefit analysis. Determining which technologies are worth investment is within scope for decision makers that otherwise know how to make selective investments in growing the capabilities of their firm. The problems faced by a carrier are much bigger than sorting out AI if management lacks the basic skillset for making business investments.

Providing an inventory of every application of AI is beyond the scope of this article. DeepIndex provides a list of 405 at deepindex.org, from playing the Atari 2600 to spotting forged artworks. Instead, suppose that AI, like electricity, will be broadly applicable across industries and functions, including the components of the insurance value chain from distribution to pricing and underwriting to claims. The goal is to identify and implement the AI-empowered solutions that will further a competitive advantage. Our view is that carriers’ success with AI requires three key ingredients: data, infrastructure and talent.

Data: AI might be considered the key that unlocks the door of big data. Many of the modeling techniques that fall under the AI umbrella are classification algorithms that are data hungry. Unlocking the power of these methods requires sufficient volume of training data. Data takes several forms. First, there are third party data sources that are considered external to the insurance industry. Aerial imagery (and the processing thereof) to determine building characteristics or estimate post-catastrophe claims potential are easy examples. Same with the vast quantities of behavioral data built on the interactions of users with digital platforms like social media and web search. Closer to home, insurance has long been an industry of data, and carriers are presumed to have meaningful datasets in claims, applications and marketing, among others.

Infrastructure: Accessing the data to feed the AI requires a working infrastructure. How successfully can you ingest external data sources? How disparate and unstructured can those sources be? Cloud computing is not necessarily a prerequisite to successful AI, but access to vast, scalable infrastructure is enabling. Are your information systems equipped, including security vetting, to do modeling in the cloud? Can you extract your internal data into forms that are ready to be processed using advanced modeling techniques? Or are you running siloed legacy systems that prevent using your proprietary data in novel ways?

Talent: Add data science to the list of AI-related buzzwords. We claimed earlier that many of the advancements attributed to narrow AI are predictive models conceptually like modeling techniques already used in the insurance industry. However, the fact that your pricing actuary conceptually appreciates an artificial neural net built for fraud detection using behavioral data does not mean you have the in-house expertise to build such a model. Investments in recruiting, training and retaining the right talent will provide two clear benefits. The first benefit is being better equipped to do the risk-cost-benefit analysis of which data and methods to explore. The second is having the ability to test and, ultimately, implement.

See also: 4 Ways Connectivity Is Revolutionary  

In Aon’s 2017 Global Insurance Market Outlook we explored the idea of the third wave of innovation as propounded by Steve Case, founder of AOL, in his book, “The Third Wave: An Entrepreneur’s Vision of the Future.” The upshot of the third wave for insurers was that partnership with technology innovators, rather than disruption by them, would be the norm. This approach applies now more than ever as technological innovators continue to unlock the potential of AI. If you don’t have the data, or the infrastructure, or the talent to bring the newest technologies to bear, you can partner with someone that does. Artificial intelligence is real. While the definitions are somewhat vague – is it software, predictive models, neural nets or machine learning – and the hype can be difficult to look past, the impacts are already being felt in the form of chatbots, image processing and behavioral prediction algorithms, among many others. The carriers that can best take advantage of the opportunities will be those that have a pragmatic ability to evaluate tangible AI solutions that are incremental to existing parts of their value chain.

If you don’t have an AI strategy, you are going to die in the world that’s coming.” Devin Wenig CEO, eBay

Maybe true, but that does not make it daunting. The core of insurance is this: Hire the right people, give them the infrastructure they need to evaluate risk better than the competition and curate the necessary data to feed the classification models they build. AI hasn’t, and won’t, change that.

5 Key Effects From AI and Data Science

In the digital era of innovative products and services, insurtech technologies are bringing great opportunities to the insurance sector and accelerating the industry’s transformation. Advances in AI and data science are leading insurers toward the effective use of machine learning, data modeling and predictive analytics to improve back-end processes and streamlining and automation of the front-end experience for both consumers and insurance companies.

Here are five ways that insurance companies are applying AI and data to the industry:

1. Front-end sales, underwriting and policy service

Customers are acquiring insurance policies much faster and easier with the help of automated processes. These technologies differ depending on the systems that employ them and the people they serve. Integration gateways relying on data and AI are creating new customer experiences.

See also: Seriously? Artificial Intelligence?  

2. Back-end claim services

AI, IoT, predictive analytics and data modeling let insurers refocus claims so that it is easier to file, submit, adjust and reimburse claims. This means customers have their claims settled in an expedited manner. Patterns of fraud are detected, learned from and shared via modeling and the AI that combs them for key information.

3. Business intelligence and big data

Smartphones, telematics and sensors from wearables and connected homes provide a wealth of new data. In a connected world, insurers can generate insights from both external and sensor-based data sources. How this data is collected, stored and used will determine whether insurers will build or lose trust with customers. Take necessary measures to harden networks so that the threat of cybercrime is reduced.

4. Customer experience

Insurance companies need to offer their services in a way that encourages loyalty, customer retention and loss mitigation. This can be made possible by making policy acquisition easier and keeping policyholders engaged. It’s now common for insurers to monitor driving, health and home behavior through mobile apps and wearables. In exchange for the data, carriers offer lower or customized premiums to customers whose score reflects reduced risk.

5. Customized insurance

Carriers offer insurance packages and plans based on a matrix of factors. This requires their agents to possess extensive knowledge about products as well as their new and prospective clients. Through machine learning, millions of data patterns can be analyzed to identify the most appropriate customized plan or product for a particular customer. It can even be offered to them via AI.
Data modeling and artificial intelligence are advancing rapidly. They are laying the foundation of an industry equipped to quickly take clients from prospect to policyholder with minimal touch points and reduced risk.

See also: Motto for Success: ‘Me, Free, Easy’  

Where exactly these technologies will lead us next is anyone’s guess, but carriers have begun to realize the benefits. A historically slow-to-move, conservative industry is now more nimble, innovative and tech-savvy than ever before. Transformation is here!

How Do Actuarial, Data Skills Converge?

Our survey of leading carriers shows that insurers are increasingly looking to integrate data scientists into their organizations. This is one of the most compelling and natural opportunities within the analytics function.

This document provides a summary of our observations on what insurers’ analytics function will look like in the future, the challenges carriers are currently facing to make this transition and how they can address them.

We base our observations on our experience serving a large portion of U.S. carriers. We supplemented our findings through conversations with executives at a representative sample of these carriers, including life, commercial P&C, health and specialty risk.

We also specifically address the issue of recruitment and retention of data scientists within the confines of the traditional insurance company structure.

The roles of actuaries and data scientists will be very different in 2030 than they are today

Actuaries have traditionally been responsible for defining risk classes and setting premiums. Recently, data scientists have started getting involved in building predictive analytics models for underwriting, in place of traditional intrusive procedures such as blood tests.

By 2030, automated underwriting will become the norm, and new sources of data may be incorporated into underwriting. Mortality prediction will become ever more accurate, leading to more granular (possibly at individual level) premium setting. Data scientists will likely be in charge of assessing mortality risks, while actuaries will be the ones setting premiums, or “putting a price tag on risk” – the very definition of what actuaries do.

Risk and capital management requires extensive knowledge of the insurance business and risks, and the ability to model the company’s products and balance sheet under various economic scenarios and policyholder assumptions. Actuaries’ deep understanding and skills in these areas will make them indispensable.

We do not expect this to change in the future, but by 2030, data scientists will likely play an increased role in setting assumptions underlying the risk and capital models. These assumptions will likely become more granular, based more on real-time data, and more plausible.

Actuaries have traditionally been responsible for performing experience studies and updating assumptions for in-force business. The data used for the experience studies are based on structured data in the admin system. Assumptions are typically set at a high level, varying by a few variables.

By 2030, we expect data scientists to play a leading role, and incorporate non-traditional data source such as call center or wearable devices to analyze and manage the business. Assumptions will be set at a more granular level – instead of a 2% overall lapse rate, new assumptions will identify which 2% of the policies are most likely to lapse.

See also: Wave of Change About to Hit Life Insurers

Actuaries are currently entirely responsible for development and certification of reserves per regulatory and accounting guidelines, and we expect signing off on reserves to remain the remit of actuaries.

Data scientists will likely have an increased role in certain aspects of the reserving process, such as assumptions setting. Some factor-based reserves such as IBNR may also increasingly be established based on data-driven and sophisticated techniques, which data scientists will likely play a role in.

Comparing actuarial and data science skills

Although actuaries and data scientists share many skills, there are distinct differences between their competencies and working approaches.

PwC sees three main ways to accelerate integration and improve combined value

1. Define and implement a combined operating model. Clearly defining where data scientists fit within your organizational structure and how they will interact with actuaries and other key functions will reduce friction with traditional roles, enhance change management and enable clearer delineation of duties. In our view, developing a combined analytics center of excellence is the most effective structure to maximize analytics’ value.

2. Develop a career path and hiring strategy for data scientists. The demand for advanced analytical capabilities currently far eclipses the supply of available data scientists. Having a clearly defined career path is the only way for carriers to attract and retain top data science (and actuarial) talent in an industry that is considered less cutting-edge than many others. Carriers should consider the potential structure of their future workforce, where to locate the analytics function to ensure adequate talent is locally available and how to establish remote working arrangements.

3. Encourage cross-training and cross-pollination of skills. As big data continues to drive change in the industry, actuaries and data scientists will need to step into each others’ shoes to keep pace with analytical demands. Enabling knowledge sharing will reduce dependency on certain key individuals and allow insurers to better pivot toward analytical needs. It is essential that senior leadership make appropriate training and knowledge-sharing resources available to the analytics function.

Options for integrating data scientists

Depending on the type of carrier, there are three main approaches for integrating data scientists into the operating model.

Talent acquisition: Growing data science acumen

Data science talent acquisition strategies are top of mind at the carriers with whom we spoke.

See also: Digital Playbooks for Insurers (Part 3)  

Data science career path challenges

The following can help carriers overcome common data science career path challenges.

Case study: Integration of data science and actuarial skills

PwC integrated data science skills into actuarial in-force analytics for a leading life insurer so the company could gain significant analytical value and generate meaningful insights.


This insurer had a relatively new variable annuity line without much long-term experience gauging its risk. Uncertainty about excess withdrawals and rise in future surrender rates had major implications for the company’s reserve requirements and strategic product decisions. Traditional actuarial modeling approaches were limited to six to 12 months of confidence at a high level, with only a few variables. They were not adequate for major changes in the economy or policyholder behavior at a more granular level.


After engaging PwC’s support, in-force analytics expanded to use data science skills such as statistical and simulation modeling to explore possible outcomes across a wide range of economic, strategic and behavioral scenarios at the individual household-level.

Examples of data science solutions include:

  • Applying various machine learning algorithms to 10 years of policyholder data to better identify most predictive
  • Using statistical matching techniques to enrich the client data with various external datasets and thereby create an
    accurate household-level view.
  • Developing a simulation model to simulate policyholder behavior in a competitive environment as a sandbox to run scenario analysis over a 30-year period.


The enriched data factored in non-traditional information, such as household employment status, expenses, health status and assets. The integrated model that simulated policyholder behavior allowed for more informed estimates of withdrawals, surrenders and annuitizations. Modeling “what if” scenarios helped in reducing the liquidity risk stemming from uncertainty regarding excess withdrawals and increase in surrender rates.

All of these allowed the client to better manage its in-force, reserve requirements and strategic product decisions.

This report was written by Anand Rao, Pia Ramchandani, Shaio-Tien Pan, Rich de Haan, Mark Jones and Graham Hall. You can download the full report here.

The Challenges of ‘Data Wrangling’

A couple of conversations with data leaders have reminded me of the data wrangling challenges that a number of you are still facing.

Despite the amount of media coverage for deep learning and other more advanced techniques, most data science teams are still struggling with more basic data problems.

Even well-established analytics teams can still lack the single customer view, easily accessible data lake or analytical playpen that they need for their work.

Insight leaders also regularly express frustration that they and their teams are still bogged down in data fire fighting’, rather than getting to analytical work that could be transformative.

Part of the problem may be lack of focus. Data and data management are often still considered the least sexy part of customer insight or data science. All too often, leaders lack clear data plans, models or strategy to develop the data ecosystem (including infrastructure) that will enable all other work by the team.

Back in 2015, we conducted a poll of leaders, asking about use of data models and metadata. Shockingly, none of those surveyed had conceptual data models in place, and half also lacked logical data models. Exacerbating this lack of a clear, technology-independent understanding of your data, all leaders surveyed cited a lack of effective metadata. Without these tools in place, data management is in danger of considerable rework and feeling like a DIY, best-endeavors frustration.

See also: Next Step: Merging Big Data and AI  

So, what are the common data problems I hear, when meeting data leaders across the country? Here is one that crops up most often:

Too much time taken up on data prep

I was reminded of this often-cited challenge by a post on LinkedIn from Martin Squires, experienced leader of Boot’s insight team. Sharing a post originally published in Forbes magazine, Martin reflected how little has changed in 20 years. This survey shows that, just as Martin and I found 20 years ago, more than 60% of data scientists’ time is taken up with cleaning and organizing data

The problem might now have new names, like data wrangling or data munging, but the problem remains the same. From my own experience of leading teams, this problem will not be resolved by just waiting for the next generation of tools. Instead, insight leaders need to face the problem and resolve such a waste of highly skilled analyst time.

Here are some common reasons that the problem has proved intractable:

  • Underinvestment in technology whose benefit is not seen outside of analytics teams (data lakes/ETL software)
  • Lack of transparency to internal customers as to amount of time taken up in data prep (inadequate briefing process)
  • Lack of consequences for IT or internal customers if situation is allowed to continue (share the pain)

On that last point, I want to reiterate advice given to coaching clients. Ask yourself honestly, are you your own worst enemy by keeping the show on the road despite these data barriers? Have you ever considered letting a piece of work or regular job fail, to highlight technology problems that your team are currently masking by manual workarounds? It’s worth considering as a tactic.

Beyond that more radical approach, what can data leaders do to overcome these problems and achieve delivery of successful data projects to reduce the data wrangling workload? Here are three tips that I hope help set you on the right path.

Create a playpen to enable play to prioritize data needed

Here, once again, language can confuse or divide. Whether one talks about data lakes or, less impressively, playpens or sandpits within  a server or data warehouse — common benefits can be realized.

More than a decade working across IT roles, followed by leading data projects from the business side, taught me that one of the biggest causes of delay and mistakes was data mapping work. The arduous task of accurately mapping all the data required by a business, from source systems  through any required ETL (extract transform and load) layers, on to the analytics database solution is fraught with problems.

All too often this is the biggest cost and cause of delays or rework for data projects. Frustratingly, for those who do audit usage afterward, one can find that not all the data loaded is actually used. So, after frustration for both IT and insight teams, only a subset of the data really added value.

This is where a free-format data lake or playpen can really add value. They should be used to enable IT to dump data there with minimal effort, or for insight teams to access potential data sources for one-off extracts to the playpen. Here, analysts or data scientists can have opportunity to play with the data. However, this capability is far more valuable than that sounds. Better language is perhaps “data lab’.” Here, the business experts have the opportunity to try use of different potential data feeds and variables within them and to learn which are actually useful/predictive/used for analysis or modeling that will add value.

The great benefit of this approach is to enable a lower cost and more flexible way of de-scoping the data variables and data feeds actually required in live systems. Reducing those can radically increase the speed of delivery for new data warehouses or releases of changes/upgrades.

Recruit and develop data specialist roles outside of IT

The approach proposed above, together with innumerable change projects across today’s businesses, need to be informed by someone who knows what each data item means. That may sound obvious, but too few businesses have clear knowledge management or career development strategies to meet that need.

Decades ago, small IT teams contained long serving experts who had built all the systems used and were actively involved with fixing any data issues that arose. If they were also sufficiently knowledgeable about the business and how each data item was used by different teams, they could potentially provide the data expertise I propose. However, those days have long gone.

Most corporate IT teams are now closer to the proverbial baked bean factory. They may have the experience and skills needed to deliver the data infrastructure. But they lack any depth of understanding of the data items (or blood) that flows through those arteries. If the data needs of analysts or data scientists are to be met, they need to be able to talk with experts in data models, data quality and metadata, to discuss what analysts are seeking to understand or model in the real world of a customer and translate that into the most accurate and accessible proxy within data variables available.

So, I recommend insight leaders seriously consider the benefit of in-house data management teams, with real specialization in understanding data and curating it to meet team needs. We’ve previously posted some hints for getting the best out of these teams.

Grow incrementally, delivering value each time, to justify investment

I’m sure all change leaders and most insight leaders have heard the advice on how to eat an elephant or deliver major change. That rubric, to deliver one bite at a time, is as true as ever.

Although it can help for an insight leader to take time out, step back and consider all the data needs/gaps – leaders also need to be pragmatic about the best approach to deliver on those needs. Using the data lake approach and data specialists mentioned above, time should be taken to prioritize data requirements.

See also: Why to Refocus on Data and Analytics  

Investigating data requirements to be able to score each against both potential business value and ease of implementation (classic Boston Consulting grid style), can help with scoping decisions. But I’d also counsel against just selecting randomly the most promising and easiest to access variables.

Instead, think in terms of use cases. Most successful insight teams have grown incrementally, by proving the value they can add to a business one application at a time. So, dimensions like the different urgency + importance of business problems come into play, as well.

For your first iteration of a project to invest in extra data, then prove value to business to secure budget for next wave – look for the following characteristics:

  • Analysis using data lake/playpen has shown potential
  • Relatively easy to access data and not too many variables (in the quick win category for IT team)
  • Important business problem that is widely seen as a current priority to fix (with rapid impact able to be measured)
  • Good stakeholder relationship with business leader in application area (current or potential advocate)

How is your data wrangling going?

Do your analysts spend too much time hunting down the right data and then corralling it into the form needed for required analysis? Have you overcome the time burned by data prep? If so, what has worked for you and your team?

We would love to hear of leadership approaches/decisions, software or processes that you have found helpful. Why not share them here, so other insight leaders can also improve practice in this area?

Let’s not wait another 20 years to stop the data wrangling drain. There is too much potentially valuable insight or data science work to be done.