Tag Archives: NoSQL

10 Trends on Big Data, Advanced Analytics

Recently, I was invited to present on the impact of big data and advanced analytics on the insurance industry at the NCSL Legislative Summit. This talk couldn’t have been timelier, as the insurance sector now tops the list of most disrupted sectors. Some of the culprits and causes for this top spot are related to the speed of technological change, changing customer behavior, increased investments in the insurtech sector and new market entrants, such as homeowners and renters insurance startup Lemonade. A significant driver of this disruption is technological change – especially in big data and advanced analytics.

See also: Why to Refocus on Data and Analytics  

Here are 10 key trends that are affecting big data and advanced analytics – most of which have a hand in disrupting the insurance industry:

  1. Size and scope – Big data is getting bigger and faster. With connected cars, homes and buildings, and machines, the amount of data is increasing exponentially. Investments in IoT and Industrial IoT, 5G and other related areas will only increase the speed and amount of data. With this increased volume and velocity, we will not be able to generate meaningful insights from all of this data without advanced analytics and artificial intelligence.
  2. Big data technology – Big data technology is moving from Hadoop to streaming architectures to hybrid “translytical” databases. While concepts like “data lakes” and NoSQL databases mature, new technologies like Apache Spark, Tez, Storm, BigTop and REEF, among others, are creating a constant flow of new tools, which adds to a sense of “big data in flux.”
  3. Democratization – The democratization of data, business intelligence and data science is accelerating. Essentially, this means that anybody in a given organization with the right permissions can use any dataset, slice and dice the data, run analysis and create reports with very little help from IT or data scientists. This creates expectations for timely delivery, and business analysts can no longer hide behind IT timelines and potential delays.
  4. Open source movement – The open source revolution in data, code and citizen data scientist is accelerating access to data and generation of insights. Open source tools are maturing and finding their way into commercial vendor solutions, and the pace of open source tool creation is continuing unabated; the Apache Software Foundation lists more than 350 current open source initiatives. This steady stream requires data engineers and data scientists to constantly evaluate tools and discover new ways of data engineering and data science.
  5. Ubiquitous intelligence – Advanced analytics – especially various types of artificial intelligence areas (reference to my AI report post) – is evolving and becoming ubiquitous intelligence. AI can now interact with us through natural language, speak to us, hear us, see the world and even feel objects. As a result, it will start seamlessly weaving itself into many of our day-to-day activities, such as using a search engine or sorting our email, recommending things to buy based on our preferences and needs, seeing the world and guiding us through our interaction with other people and things without our even being aware of its doing so. This will further heighten our sense of disruption and constant change.
  6. Deep learning – Deep learning, a subset of the machine learning family (which itself is just one area of AI), has been improving in speed, scale, accuracy, sophistication and the scope of problems it addresses. Unlike previous techniques, which were specific to the different type of data (e.g., text, audio, image), deep learning techniques have been applied across all different types of data. This has contributed to reduced development time and greater sharing and broadened the scope of innovation and disruption.
  7. MLaaS – Machine learning, cloud computing and open source movement are converging to create Machine Learning as a Service (MLaaS). This not only decreases the overall variable costs of using AI but also provides large volumes of data that the machine learning systems can further exploit to improve their accuracy, resulting in a virtuous cycle.
  8. Funding – Big data funding peaked in 2015. However, funding for artificial intelligence, especially machine learning and deep learning, has continued to attract increasingly significant investments. In the first half of this year, more than $3.6 billion has been invested in AI and machine learning. This increased funding has attracted great talent to explore difficult areas of AI that will be disruptors of the future economy.
  9. Center of Excellence: As organizations continue to obtain good ROI from their initial pilots and proof-of-concepts in analytics, automation and AI efforts, they are increasingly looking toward setting up centers of excellence where they can train, nurture and grow the talent. The exact role of the center changes based on the overall organizational culture and how the rest of their business operates – centralized, federated or decentralized.
  10. Competitive landscape – The big data landscape continues to grow, and the AI landscape is expanding rapidly. Deep learning companies are growing the fastest across multiple sectors. Competition among startups – as well as incumbents that want to stay ahead of potential disruption – is creating a vibrant ecosystem of partnerships and mergers and acquisitions that further the disruptive cycle.

See also: Analytics and Survival in the Data Age  

Are there other trends you would add to the list? Share them here!

3 Phases to Produce Real IoT Value

In May, I wrote about The Three Phases Insurers Need for Real Big Data Value, assessing how insurance companies progress through levels of maturity as they invest in and innovate around big data. It turns out that there’s a similar evolution around how insurers consume and use feeds from the Internet of Things, whether talking about sensor devices, wearables, drones or any other source of complex, unstructured data. The growth of IoT in the insurance space (especially with automotive telematics) is one of the major reasons insurers have needed to think beyond traditional databases. This is no surprise, as Novarica has explained previously how these emerging technologies are intertwined in their increasing adoption.

The reality on the ground is that the adoption of the Internet of Things in the insurance industry has outpaced the adoption of big data technologies like Hadoop and other NoSQL/unstructured databases. Just because an insurer hasn’t yet built up a robust internal skill set for dealing with big data doesn’t mean that those insurers won’t want to take advantage of the new information and insight available from big data sources. Despite the seeming contradiction in that statement, there are actually three different levels of IoT and big data consumption that allow insurers at various phases of technology adoption to work with these new sources.

See also: 7 Predictions for IoT Impact on Insurance  

Phase 1: Scored IoT Data Only

For certain sources of IoT/sensor data, it’s possible for insurers to bypass the bulk of the data entirely. Rather than pulling the big data into their environment, the insurer can rely on a trusted third party to do the work for it, gathering the data and then using analytics and predictive models to reduce the data to a score. One example in use now is third-party companies that gather telematics data for drivers and generate a “driver score” that assesses a driver’s behavior and ability relative to others. On the insurer’s end, only this high-level score is stored and associated with a policyholder or a risk, much like how credit scores are used.

This kind of scored use of IoT data is good for top-level decision-making, executive review across the book of business or big-picture analysis of the data set. It requires having significant trust in the third-party vendor’s ability to calculate the score. Even when the insurer does trust that score, it’s never going to be as closely correlated to the insurer’s business because it’s built with general data rather than the insurer’s claims and loss history. In some cases, especially insurers with smaller books of business, this might actually be a plus, because a third party might be basing its scores on a wider set of contributory data sets. And even large insurers that have matured to later phases of IoT data consumption might still want to leverage these third-party scores as a way to validate and accentuate the kind of scoring they do internally.

One limitation is that a third party that aggregates and scores the kind of IoT data the insurer is interested in has to already exist. While this is the case for telematics, there may be other areas where that’s not the case, leaving the insurer to move to one of the next phases on its own.

Phase 2: Cleansed/Simplified IoT Data Ingestion

Just because an insurer has access to an IoT data source (whether through its own distribution of devices or by tapping into an existing sensor network) doesn’t mean the insurer has the big data capability to consume and process all of it. The good news is it’s still possible to get value out of these data sources even if that’s the case. In fact, in an earlier survey report by Novarica, while more than 60% of insurers stated that they were using some forms of big data, less than 40% of those insurers were using anything other than traditional SQL databases. How is that possible if traditional databases are not equipped to consume the flow of big data from IoT devices?

What’s happening is that these insurers are pulling the key metrics from an IoT data stream and loading it into a traditional relational database. This isn’t a new approach; insurers have been doing this for a long time with many types of data sets. For example, when we talk about weather data we’re typically not actually pulling all temperatures and condition data throughout the day in every single area, but rather simplifying it to condition and temperature high and low at a zip code (or even county) on a per-day basis. Similarly, an insurer can install telematics devices in vehicles and only capture a slice of the data (e.g. top speed, number of hard breaks, number of hard accelerations—rather than every minor movement), or filter only a few key metrics from a wearable device (e.g. number of steps per day rather than full GPS data).

This kind of reduced data set limits the full set of analysis possible, but it does provide some benefits, too. It allows human querying and visualization without special tools, as well as a simpler overlay onto existing normalized records in a traditional data warehouse. Plus, and perhaps more importantly, it doesn’t require an insurer to have big data expertise inside its organization to start getting some value from the Internet of Things. In fact, in some cases the client may feel more comfortable knowing that only a subset of the personal data is being stored.

Phase 3: Full IoT Data Ingestion

Once an insurer has a robust big data technology expertise in house, or has brought in a consultant to provide this expertise, it’s possible to capture the entire range of data being generated by IoT sensors. This means gathering the full set of sensor data, loading it into Hadoop or another unstructured database and layering it with existing loss history and policy data. This data is then available for machine-driven correlation and analysis, identifying insights that would not have been available or expected with the more limited data sets of the previous phases. In addition, this kind of data is now available for future insight as more and more data sets are layered into the big data environment. For the most part, this kind of complete sensor data set is too deep for humans to use directly, and it will require tools to do initial analysis and visualization such that what the insurer ends up working with makes sense.

As insurers embrace artificial intelligence solutions, having a lot of data to underpin machine learning and deep learning systems will be key to their success. An AI approach will be a particularly good way of getting value out of IoT data. Insurers working only in Phase 1 or Phase 2 of the IoT maturity scale will not be building the history of data in this fashion. Consuming the full set of IoT data in a big data environment now will establish a future basis for AI insight, even if there is a limited insight capability to start.

See also: IoT’s Implications for Insurance Carriers  

Different Phases Provide Different Value

These three IoT phases are not necessarily linear. Many insurers will choose to work with IoT data using all three approaches simultaneously, due to the different values they bring. An insurer that is fully leveraging Hadoop might still want to overlay some cleansed/simplified IoT data into its existing data warehouse, and may also want to take advantage of third-party scores as a way of validating its own complete scoring. Insurers need to not only develop the skill set to deal with IoT data, but also the use cases for how they want it to affect their business. As is the case with all data projects, if it doesn’t affect concrete decision-making and business direction, then the value will not be clear to the stakeholders.

Frustrated on Your Data Journey?

It’s going to take how much longer?! It’s going to cost how much more?!!

If those sound like all too familiar expressions of frustration, in relation to your data journey (projects), you’re in good company.

It seems most corporations these days struggle to make the progress they plan, with regards to building a single customer view (SCV), or providing the data needed by their analysts.

An article on MyCustomer.com, by Adrian Kingwell, cited a recent Experian survey that found 72% of businesses understood the advantages of an SCV, but only 16% had one in place. Following that, on CustomerThink.com, Adrian Swinscoe makes an interesting case for it being more time/cost-effective to build one directly from asking the customer.

That approach could work for some businesses (especially small and medium-sized busineses) and can be combined with visible data transparency, but it is much harder for large, established businesses to justify troubling the customer for data they should already have. So the challenge remains.

A recent survey on Customer Insight Leader suggests another reason for problems in “data project land.” In summary, you shared that:

  • 100% of you disagree or strongly disagree with the statement that you have a conceptual data model in place;
  • 50% of you disagreed (rest were undecided) with the statement that you have a logical data model in place;
  • Only 50% agreed (rest disagreed) with the statement that you have a physical data model in place.

These results did not surprise me, as they echo my experience of working in large corporations. Most appear to lack especially the conceptual, data models. Given the need to be flexible in implementation and respond to the data quality or data mapping issues that always arise on such projects, this is concerning. With so much focus on technology these days, I fear the importance of a model/plan/map has been lost. Without a technology independent view of the data entities, relationships and data items that a team needs to do their job, businesses will continue to be at the mercy of changing technology solutions.

Your later answers also point to a related problem that can plague customer insight analysts seeking to understand customer behavior:

  • All of you strongly disagreed with the statement that all three types of data models are updated when your business changes;
  • 100% of you also disagreed with the statement that you have effective meta data (e.g. up-to-date data dictionary) in place.

Without the work to keep models reflecting reality and meta data sources guiding users/analysts on the meaning of fields and which can be trusted, both can wither on the vine. Isn’t it short-sighted investment to spend perhaps millions of pounds on a technology solution but then balk at the cost of data specialists to manage these precious knowledge management elements?

Perhaps those of us speaking about insight, data science, big data, etc. also carry a responsibility. If it has always been true that data tends to be viewed as a boring topic compared with analytics, it is doubly true that we tend to avoid the topics of data management and data modeling. But voices need to cry out in the wilderness for these disciplines. Despite the ways Hadoop, NoSQL or other solutions can help overcome potential technology barriers — no one gets data solutions for their business “out of the box.” It takes hard work and diligent management to ensure data is used & understood effectively.

I hope, in a very small way, these survey results act as a bit of a wake up call. Over coming weeks I will be attending or speaking at various events. So, I’ll also reflect how I can speak out more effectively for this neglected but vital skill.

On that challenge of why businesses fail to build the SCVs they need, another cause has become apparent to me over the years. Too often, requirements are too ambitious in the first place. Over time working on both sides of the “IT fence,” it is common to hear expressed by analytical teams that they want all the data available (at least from feeds they can get). Without more effective prioritization of which data feeds, or specifically which variables within those feeds, are worth the effort – projects get bogged down in excessive data mapping work.

Have you seen the value of a “data labs” approach? Finding a way to enable your analysts to manually get hold of an example data extract, so they can try analyzing data and building models, can help massively. At least 80% of the time, they will find that only a few of the variable are actually useful in practice. This enables more pragmatic requirements and a leaner IT build which is much more likely to deliver (sometimes even within time & budget).

Here’s that article from Adrian Swinscoe, with links to Adrian Kingwell, too.

What’s your experience? If you recognize the results of this survey, how do you cope with the lack of data models or up-to-date meta data? Are you suffering data project lethargy as a result?