October 29, 2015
The Rise of Big (Bad) Data
by Karen Wolfe
Workers' comp often has errors in reporting, and if bad data is used in a big data strategy the result is merely big bad data.
The workers’ compensation industry has created and stored huge amounts of data over the past 25 years. The copious amount of data has led to a new phenomenon in our industry, similar to most others-the concept of big data. The goal is to corral, manage and query the industry’s big data for greater insight.
Big data is a general term used to describe voluminous amounts of data, whether unstructured or structured. It’s that simple.
Unstructured data is a generic term used to describe data not contained in a database or some other type of prescribed data container. Examples of unstructured data are claim adjuster and medical case manager notes. The data can also include emails, videos, social media, instant messaging and other free-form types of input. Gaining reliable information from unstructured data is significantly more difficult than from structured data.
Structured data is that which is housed in a specified format in a predefined container that can be mined for information. Structured data is designed for a specific purpose so that it can be accessed and manipulated.
The workers’ compensation industry has both forms of data. However, structured data is more available for mining, analyzing and interpreting.
To evolve ordinary data to big data in workers’ compensation, data from multiple silos must first be integrated. The industry uniquely maintains claim-related data in separate places such as bill review, claims systems, utilization review, medical case management and pharmacy or pharmacy benefits management (PBM).
While integrating data is an achievable task, other issues remain. Unfortunately, much of the existing data in this industry has quality issues. Data entry errors, omissions and duplications occur frequently, and if left unchanged will naturally become a part of big data. Poor data quality is amplified when it is promoted to big data.
The reason big data is so attractive is that it provides the quantity of data necessary for reliable analytics and predictive modeling. More data is better because analysis is statistically more valid when it is informed by more occurrences. Nevertheless, greater volumes of data cannot produce the desired information if it is wrong.
Predicting that a devastating earthquake will occur in the next 25 years does not generate urgency. Likewise, knowing “clean” big data will be needed to remain competitive and viable in the future does not inspire aggressive corrective action now. But it should.
Correcting smaller data sets is easier than trying to fix huge data sets. It may not even be possible to adequately cleanse big data. Moreover, preventing erroneous data before it occurs is an even better approach. Data quality should be valued. Those responsible for collecting data, whether manually or electronically, should be held accountable for its accuracy. Existing data should be evaluated and corrected now to create complete and accurate data.
Doing so will speed migration to big data without drowning in big bad data.