Tag Archives: unstructured data

Role of Unstructured Data in AI

The process of making artificial intelligence (AI) systems interact more like humans makes some people uncomfortable, but AI is not about replacing humans. In reality, it is much more about removing the robot from humans. A big part of AI’s value lies in automating manual processes and analyzing vast amounts of data quickly so that humans are free to accomplish higher-order tasks that require reason and judgment. To get to this point, however, AI systems must be able to communicate with users and analyze natural forms of data (aka unstructured data) — all of the free-flowing stuff that is unable to be packaged in a neat way, things like voice, images and text.

Unstructured data is vital to the development of an AI system. The better an AI system communicates with users, the more it can learn on its own and, therefore, the more efficient it will be. This is important because if an AI system requires a user to interact only in a structured format, its components are dramatically limited. For AI to be successful, it has to make sense out of messy information.

In this context, let’s dive deeper into how unstructured data comes into play.

The Challenges of Unstructured Data

In the human world, you and I do not speak by protocol when we carry on a conversation. We say whatever pops into our heads, in some configuration that may or may not follow convention. We use slang, incorporate sarcasm and crack jokes. It is not natural for us to organize our everyday language and the information we wish to convey into neat columns and rows. Speech is natively unstructured.

If you’ve ever interacted with Amazon’s Alexa, you know that, while the Echo system has generally become quite proficient at understanding free-form commands, the lack of a defined protocol can sometimes cause problems — or at least humorous responses when Alexa attempts to answer queries that don’t fit the mold. Amazon has poured massive resources and millions of dollars into creating and perpetually refining the algorithms that enable this humanlike voice to respond to commands, but, as adept as Echo has become at deciphering free-flowing language, Alexa still has flaws.

See also: 5 Key Effects From AI and Data Science  

The Alexa example highlights the complexity of one type of unstructured data. An AI system’s ability to process and create a numerical equivalent to text is also a tall order, especially when you consider nuance and the importance of context. And imagine a machine trying to “understand” what is happening in that picture from your family vacation or an image in an art history textbook covering Impressionism.

The complications associated with processing unstructured data are perhaps the biggest obstacles for AI in the enterprise. Yet, they are not insurmountable.

The Importance of Expertise

Unstructured data is inherently noisy. As such, it requires substantial expertise to cut through and tease out patterns, then develop models that recognize those patterns. Data scientists are pushing aggressively to improve AI systems, and the biggest successes underscore that human instinct and experience are required. This usually happens when a team is focused on a very narrow application of AI.

Let’s look at the workers’ compensation claims process as an example. Teams of data scientists with a deep knowledge of claims can create predictive models based on key indicators they spot. They incorporate unstructured data such as diagnostics, drug information and claim notes. In doing so, the AI system assesses early indicators and determines that a certain claim might be denied. It can then provide an alert to users. A claims representative can figure out how to intervene and give a particular claim more care to prevent the claimant’s attorney from getting involved (typically, denied claims wind up involving an attorney, which gets very expensive and takes a long time to resolve).

In this case, it is easy to see how the AI system provides assistance to its users, and there is also a tremendous boost in accuracy when that unstructured data is incorporated versus relying on structural data alone. There is a gold mine of information and insight in the unstructured data (e.g., information about comorbidities) that just doesn’t find its way into structured data consistently. With each additional piece of information, the AI system gets smarter, and results improve. This translates to greater efficiency and lower claims costs.

This is just one example of one benefit from incorporating unstructured data into an enterprise AI system. It takes time and diligence to crack the code, but the payoff is gaining a level of insight that has never been possible before — and getting it in a matter of minutes or hours compared with days or weeks.

Unstructured Data Is the Key

Every AI system needs to interact with users in a natural way. Organizations must have a sharp focus on this. In fact, there is a huge gap in a company’s offering if unstructured data analysis is not part of the road map.

See also: Next Step: Merging Big Data and AI  

While unstructured data is challenging, Amazon, Google, Apple and others have opened a lot of opportunities for AI applications. We can take these advances and apply them to enterprise applications where they have an enormous impact.

By taking the time to apply expertise and sound data science, we can make breakthroughs. We will not only improve accuracy in data analysis through unstructured data but also achieve fundamentally new ways of thinking, communicating and utilizing information in the future.

As first published in The Innovation Enterprise.

Marketers Bringing Action to Big Data

Today’s mobile, social world has created an explosion of data that is presenting great opportunities for all industries, especially insurance. Consider that by 2020 new information produced per second for every human being will reach 1.7 megabytes. And the volume of big data will increase from 4.4 zettabytes to roughly 44 zettabytes, or 44 trillion gigabytes.

With large data resources, carriers and their customers are collaborating more efficiently, resulting in better, faster, and more valuable interactions that in the end are intended to deliver a better consumer experience. They’re also entering a time where they can be more accurate and precise. For example, the data available today enables theoretical “pools of 1” versus the typical insurance pools that have led to risk sharing across large groups of people. In addition, the vast majority of data is unstructured—or social media postings, online and offline shopping activity, emails, reports, and interviews. This isn’t the data we’re used to and the implications of this potential has both pros and cons for insurance.

Pros and Cons

Without a doubt, big data’s influence is present throughout the insurance value chain – more specifically, during product development, pricing, marketing, sales, customer service, claims and management activities. Data is also being used to streamline the application and claims process. Applying machine learning algorithms to outcomes is helping claims processing. There’s also been a noted reduction in fraud through better identification techniques.

See also: Cognitive Computing: Taming Big Data  

On the flip side, complexity and volume of data may present hurdles for less data-centric and smaller insurers. There are challenges in terms of technology and data science resource constraints, as well as increased consumer privacy concerns. Further, we see some companies unable to leverage data because their culture doesn’t support innovation. Some carriers, such as Progressive, don’t have that problem. Another example is Nationwide, where the company’s chief data officer, Jim Tyo, has a stated goal “to not be an insurance company but to be a data company that sells insurance.” Unfortunately, we bump up against folks at other companies who are on the opposite end of the “strong data culture” spectrum.


While most insurance carriers have an overabundance of data about their prospects and customers, the challenge is making that data accessible, actionable and relevant in real time. The undeniable goal is to ensure the data adds value to the business to acquire, retain and grow the customer base. It’s essential to gain access to the right data at the right time and turn one-time buyers into lifelong customers.

Big data is making it easier to target markets with more precision and assist with personalized marketing (see a recent McKinsey article on this topic)—both of which improve the customer experience. With so much data available, ensuring relevance and quality is a key difference between those successfully using big data and those who are struggling to understand it. New technologies are enabling insurance marketers like never before to sort quickly through multiple potential data sources to identify those relevant to them.

And it’s not just new data sources that offer opportunity. Our customers are also pushing the envelope by finding new use cases from existing data sources. Those who embrace this level of innovation are growing profits and gaining market share.

Lifetime Value

After 20-plus years of online media evolution, insurance marketers have started to see that an individual digital event—where the consumer is researching or raising his or her hand for an insurance product on a given brand’s site or a third-party comparison site—is one moment in time in the consumer’s journey. It’s one of several critical moments where carriers are aligning their engagement efforts. And these moments are fueling the big data available to insurance marketers, which is evidenced by the nearly 1.5 million unique online insurance events my organization sees every day.

The customer’s engagement involves research ahead of the quote request and more research after, ultimately leading to the conversion event. All of the breadcrumbs along the journey tend to be inaccessible to marketers or the media partners that are creating this behavioral data. Brands and partners are both challenged to connect these intent signals, but they are incredibly important. Technology to connect these events in the consumer’s journey is essential.

See also: 3-Step Approach to Big Data Analytics  

Done right, and in partnership with the digital ecosystem, these tools can identify individual consumer behavior and link multiple activities regardless of device type. That data can be converted into insights that can then be leveraged in real time to retain current customers, grow relationships with existing customers and establish new relationships.

The majority of the top insurance companies in the U.S. are connecting the dots and using sophisticated technology and data to gain real-time intelligence into the origin, history and intent of prospects and customers. Such solutions enable carriers and agents to follow consumers on their buying journeys until the end when they purchase a policy, helping insurers observe and access behavioral data they can use to analyze the intent of the consumer at any given moment.

When marketers gain the ability to identify and take action on data, they can be more efficient and simultaneously enhance the consumer experience and increase customer lifetime value.

Cognitive Computing: Taming Big Data

In the complex, diverse insurance industry, it can be hard to reconcile theory and practice. Adapting to new processes, systems, and strategies is always challenging. However, with the arrival of new opportunities, cultural transformation will go more smoothly.

Insurance companies that are considering how to plug into the insurtech landscape should understand the various models within the innovation ecosystem. Carriers have to weigh their options carefully before choosing between incubators and accelerators, or venture capital and partnerships, when creating their best internal and external teams.

The key elements disrupting the insurance industry include the Internet of Things (IoT), wearables, big data, artificial intelligence and on-demand insurance. Although well-established business models, processes and organizations are being forced to adapt, insurtech can be more collaborative than disruptive.

It is no secret that the insurance industry is responding to changing market dynamics such as new regulations, legislation and technology. With digital transformation, there are numerous ways technology can improve and streamline current insurance processes.

See also: Rise of the Machines in Insurance  

Cognitive Computing

Cognitive computing, a subset of AI, mimics human intelligence. It can be deployed to radically streamline industry processes. According to the 2016 IBM Institute for Business Value survey, 90% of insurance executives believe that cognitive technologies will have an impact on their revenue models.

The ability of cognitive technologies to handle both structured and unstructured data in new ways will foster advanced models of business operations and processes.

Insurance carriers can use this technology for improved customer self-service, call-center assistance, underwriting, claims management and regulatory compliance.

Big Data

Unstructured data is rapidly growing every day. For instance, wearables can provide insurance companies with massive amounts of data that can yield insights about their markets. Social media also produces a flood of data.

To harvest this data intelligently, insurers need to adopt the right analytical solutions to analyze, clean and verify data to customize their offerings according to their clients’ individual needs. Predictive analytics evaluates the trends found in big data to determine risk, set premiums, quote individual and group insurance policies and target key markets more accurately.

Linking the Two

Insurance organizations may have more data than they realize or know what to do with. Existing data is coming in from different core systems, and new data is being captured with IoT devices like wearables and sensors. Cognitive computing is the link to organizing and optimizing this data for use.

See also: Strategies to Master Massively Big Data  

Whether it is used to predict risk and determine premiums, flag fraudulent claims or identify what products a customer is likely to buy, cognitive computing is the way to ensure these goals are achieved. Sorting these trends among reams of data makes them more manageable and ensures that a business’s IT objectives link back to business strategies.

Over the years, systems will evolve through learning processes to a level of intelligence that can adequately support more complex business functions. Schedule a meeting with your executive team to examine risks, opportunities and insurtech synergies that can take your organization beyond the competition.

3 Phases to Produce Real IoT Value

In May, I wrote about The Three Phases Insurers Need for Real Big Data Value, assessing how insurance companies progress through levels of maturity as they invest in and innovate around big data. It turns out that there’s a similar evolution around how insurers consume and use feeds from the Internet of Things, whether talking about sensor devices, wearables, drones or any other source of complex, unstructured data. The growth of IoT in the insurance space (especially with automotive telematics) is one of the major reasons insurers have needed to think beyond traditional databases. This is no surprise, as Novarica has explained previously how these emerging technologies are intertwined in their increasing adoption.

The reality on the ground is that the adoption of the Internet of Things in the insurance industry has outpaced the adoption of big data technologies like Hadoop and other NoSQL/unstructured databases. Just because an insurer hasn’t yet built up a robust internal skill set for dealing with big data doesn’t mean that those insurers won’t want to take advantage of the new information and insight available from big data sources. Despite the seeming contradiction in that statement, there are actually three different levels of IoT and big data consumption that allow insurers at various phases of technology adoption to work with these new sources.

See also: 7 Predictions for IoT Impact on Insurance  

Phase 1: Scored IoT Data Only

For certain sources of IoT/sensor data, it’s possible for insurers to bypass the bulk of the data entirely. Rather than pulling the big data into their environment, the insurer can rely on a trusted third party to do the work for it, gathering the data and then using analytics and predictive models to reduce the data to a score. One example in use now is third-party companies that gather telematics data for drivers and generate a “driver score” that assesses a driver’s behavior and ability relative to others. On the insurer’s end, only this high-level score is stored and associated with a policyholder or a risk, much like how credit scores are used.

This kind of scored use of IoT data is good for top-level decision-making, executive review across the book of business or big-picture analysis of the data set. It requires having significant trust in the third-party vendor’s ability to calculate the score. Even when the insurer does trust that score, it’s never going to be as closely correlated to the insurer’s business because it’s built with general data rather than the insurer’s claims and loss history. In some cases, especially insurers with smaller books of business, this might actually be a plus, because a third party might be basing its scores on a wider set of contributory data sets. And even large insurers that have matured to later phases of IoT data consumption might still want to leverage these third-party scores as a way to validate and accentuate the kind of scoring they do internally.

One limitation is that a third party that aggregates and scores the kind of IoT data the insurer is interested in has to already exist. While this is the case for telematics, there may be other areas where that’s not the case, leaving the insurer to move to one of the next phases on its own.

Phase 2: Cleansed/Simplified IoT Data Ingestion

Just because an insurer has access to an IoT data source (whether through its own distribution of devices or by tapping into an existing sensor network) doesn’t mean the insurer has the big data capability to consume and process all of it. The good news is it’s still possible to get value out of these data sources even if that’s the case. In fact, in an earlier survey report by Novarica, while more than 60% of insurers stated that they were using some forms of big data, less than 40% of those insurers were using anything other than traditional SQL databases. How is that possible if traditional databases are not equipped to consume the flow of big data from IoT devices?

What’s happening is that these insurers are pulling the key metrics from an IoT data stream and loading it into a traditional relational database. This isn’t a new approach; insurers have been doing this for a long time with many types of data sets. For example, when we talk about weather data we’re typically not actually pulling all temperatures and condition data throughout the day in every single area, but rather simplifying it to condition and temperature high and low at a zip code (or even county) on a per-day basis. Similarly, an insurer can install telematics devices in vehicles and only capture a slice of the data (e.g. top speed, number of hard breaks, number of hard accelerations—rather than every minor movement), or filter only a few key metrics from a wearable device (e.g. number of steps per day rather than full GPS data).

This kind of reduced data set limits the full set of analysis possible, but it does provide some benefits, too. It allows human querying and visualization without special tools, as well as a simpler overlay onto existing normalized records in a traditional data warehouse. Plus, and perhaps more importantly, it doesn’t require an insurer to have big data expertise inside its organization to start getting some value from the Internet of Things. In fact, in some cases the client may feel more comfortable knowing that only a subset of the personal data is being stored.

Phase 3: Full IoT Data Ingestion

Once an insurer has a robust big data technology expertise in house, or has brought in a consultant to provide this expertise, it’s possible to capture the entire range of data being generated by IoT sensors. This means gathering the full set of sensor data, loading it into Hadoop or another unstructured database and layering it with existing loss history and policy data. This data is then available for machine-driven correlation and analysis, identifying insights that would not have been available or expected with the more limited data sets of the previous phases. In addition, this kind of data is now available for future insight as more and more data sets are layered into the big data environment. For the most part, this kind of complete sensor data set is too deep for humans to use directly, and it will require tools to do initial analysis and visualization such that what the insurer ends up working with makes sense.

As insurers embrace artificial intelligence solutions, having a lot of data to underpin machine learning and deep learning systems will be key to their success. An AI approach will be a particularly good way of getting value out of IoT data. Insurers working only in Phase 1 or Phase 2 of the IoT maturity scale will not be building the history of data in this fashion. Consuming the full set of IoT data in a big data environment now will establish a future basis for AI insight, even if there is a limited insight capability to start.

See also: IoT’s Implications for Insurance Carriers  

Different Phases Provide Different Value

These three IoT phases are not necessarily linear. Many insurers will choose to work with IoT data using all three approaches simultaneously, due to the different values they bring. An insurer that is fully leveraging Hadoop might still want to overlay some cleansed/simplified IoT data into its existing data warehouse, and may also want to take advantage of third-party scores as a way of validating its own complete scoring. Insurers need to not only develop the skill set to deal with IoT data, but also the use cases for how they want it to affect their business. As is the case with all data projects, if it doesn’t affect concrete decision-making and business direction, then the value will not be clear to the stakeholders.

Time to Reinvent Your Products

In my previous article, I stressed that a new commercial insurance model is about breaking down existing operations and rebuilding a collaborative and innovative model. The rebuilt model would improve operational efficiencies, control costs, create innovative products, improve customer engagement experiences and produce sustained profitability. But, if you think operational silos are challenging, I suggest you suit up and put on your protective gear because I’m about to tackle the cold reality of commercial insurance product silos!

Should the market simplify risk management solutions? Duh!!!

The role of a corporate risk manager is to protect the employees, clients and balance sheet of his or her company. The manager looks to the commercial insurance markets for risk management solutions but, instead, receives product responses that create a patchwork of protection.

When the markets can only provide products in lieu of risk management solutions, a variety of customers’ risk exposures are left uninsured — they must now consider self-insuring or creating alternative financing vehicles, including captives. Creating risk management products and simplifying risk management solutions for corporate customers should be the goal of any commercial insurance broker or company.

See also: Leveraging AI in Commercial Insurance

If companies eliminated product and service silos to create a collaborative risk solutions environment, big data collection and analysis could be implemented across all existing products and customers’ uninsured exposures. This collaborative environment would also minimize current big data challenges, including:

  • Unstructured data identification
  • Weak processes to capture and manage data
  • Poor data quality and accuracy
  • Unused data

By eliminating product silos, I’m not suggesting that you eliminate your product experts. On the contrary, each product expert has a vital role to play in challenging and supporting how new and existing products will function.

The removal of internal product silos broadens products expertise and knowledge, thus enabling product experts to further comprehend the risks that corporate customers manage every day.

You already have the clients – start implementing the tools to create risk management solutions!

Currently, commercial insurance brokers and companies are better placed to respond to insurtech competition as, unlike most insurtech startups, they have a large pool of existing customers and the ability to access or create extensive risk management data. When combined with additional imported data such as ISO, ERC and AAIS, an integrated dynamic financial model (IDFM) can be easily created to capture industry, claims, exposure and risk management patterns.

Capturing and codifying risk management data is absolutely crucial, as it elevates existing industry claims and exposure models. In addition to creating new risk management products, the IDFM data outcomes can also create a variety of new risk management services, thus creating additional revenues.

I’m not a data scientist, but I have always been fascinated with risk identification. Many years ago (more than I care to remember!!!), I created an IDFM model with the crudest of tools. The model outcomes enabled me to create risk management products and services that customers want to buy and eliminated the “one-size-fits-all” product response embedded in the commercial insurance market.

Now, by incorporating new technologies such as AI, machine learning, bots and smart sensors to improve risk management analysis, this model becomes even more dynamic! These technological additions to the IDFM model, along with the use of blockchain, enhance the model’s ability to:

  • Gather and store even more data elements;
  • Improve decisions based on that data; and
  • Provide relevant answers to improve the company’s abilities to create risk management products.

Additionally, the IDFM model enables the creation of revenue streams such as usage-based and peer-to-peer commercial insurance. These opportunities will be further examined in future articles.

All geographic regions are not the same! Expand your mind and create something new!

The ability to capture and codify the elements noted in the IDFM model, including the client risk management review, is particularly important for global commercial insurance companies and brokers expanding into Africa, Asia and Latin America.

See also: Innovation Challenge for Commercial Lines  

Risk managers in these regions continue to express their frustration with market responses as local agents, brokers and insurance companies are not expanding their product portfolios. Global companies are extremely slow in creating products that are specific for the risk management needs of these diverse local markets – instead, companies aggressively sell Europe-centric or North American-centric policies while supporting low margins. These diverse local markets would benefit from improved innovation and increased investment in technology and will be explored further in future articles.

And finally ….

Commercial insurance brokers and companies must stop viewing insurance through the prism of products and, instead, recognize its true potential as a service. Remember, it is all about the customer! Or, simply put, no customers, no business.

Cutting renewal prices and watching margins decline every year is not a sustainable business plan for incumbents or insurtech companies. Instead, the commercial insurance market must break down and rebuild product and operational silos to create a collaborative and innovative model to improve their abilities to package complex risk management products and services. Products and services can be presented in a simpler/intuitive manner, with plain language and processes that clearly manage customer expectations and increase customer satisfaction. Breaking down existing products and operations and rebuilding a collaborative and innovative model will also improve operational efficiency, control costs, increase revenue streams and produce sustained profitability. Let’s break down these barriers!