6 Limitations of Big Data in Healthcare

No. 5: Miscoding can render analysis meaningless -- one company didn't know it was paying for 25 organ transplants a year.

Tom Emerick

David Toomey

March 22, 2016

Claims data captures the services provided to a patient. This information can be grouped into different cohorts—those getting preventive exams, those seeing specific physicians or hospitals for conditions, etc. The data can be grouped by diagnosis. However, all claims data is just a collection of medical bills. Medical bills do not contain a complete look at the patient, such as important information about a patient’s prognosis. That’s a gap. Thus, it is important to set appropriate expectations on the use of the data. Here are six limitations that should be placed on the expectations: Number 1 (one of the most important): Avoid the averages Most claims data sets are not normally distributed, so the averages do not provide relevant information. In most discussions today, employers evaluate the average cost of employees with specific conditions, e.g., diabetes or high blood pressure. This is a flawed approach because spending by employees with various chronic conditions is skewed, thus not really “averageable.” For example, assume 90% of an employee population with diabetes is spending $10,000/year and 10% is spending $250,000/year; the average will be a meaningless $34,000/year. All too often, a wild goose chase ensues, when in fact the focus should be on the $250,000 cohort to understand why they were so much more expensive. See Also: Why Healthcare Costs Bleed Firms Dry Number 2: Follow the money A superior use of claims data is to look at distributions of spending. In most plans today, roughly 8% of enrollees are consuming 80% of plan dollars, and these 8% typically change every 12 to 18 months. (We still run into benefit managers who were unaware of that turnover.) The future belongs to micro-managing these “outliers,” rather than the 92% who spend only 20% of the dollars. If you study those outliers carefully, you will find that only about 7% of their spending possibly would have been preventable, and then only if they faithfully did what their doctors told them to do decades earlier. A cardiologist recently told me that, of the patients he has seen with a significant acute blockage, about 25% had no known health risks of any kind…no high blood pressure, cholesterol, diabetes, obesity, smoking, genetic predisposition, etc. As such, there is a component of randomness in terms of who gets blocked arteries. The same holds true for cancer. For the other 75%, their physicians have usually counseled them on the importance of exercise and nutrition and the dangers of tobacco use, but to no avail. Number 3: Realize the limitations for quality designations Yet another big error is trying to use claims data to determine the best-quality doctors. You had better be really, really talented to try that one. Why? We are in an era in which many doctors are making their “quality” and “outcomes” look better by referring their most complex and riskiest patients to someone else. (Much has been written about this.) On the other hand, there are highly effective doctors who take responsibility for their riskiest patients, but as a consequence score poorly on so-called “quality measures.” The real travesty is that the low-scoring doctors may be the most cost-effective and provide the best care. Number 4: Misdiagnoses are a real cost driver Another huge shortcoming of claims data is one that readers of Cracking Health Costs know about. Namely, a large number of patients with complex health problems are simply misdiagnosed – today, that’s about 20% of the outliers in benefit plans, accounting for 18% of claim dollars. Thus, you cannot rely on diagnoses in claims data, and you cannot tell who is getting diagnoses right or wrong – this takes detective work beyond claims data. Click here for a good article by the Mayo Clinic on rates of misdiagnoses. We have sent hundreds of people to the Mayo Clinic for second opinions and can verify by personal experience the truth in that article…same for other clinics we have used for employers. Our first rule in selecting a Center of Excellence is its success in correctly diagnosing patients with complex health problems. Huge amounts of claim dollars are spent on treatments or surgeries that are either completely erroneous or clearly suboptimal. An executive at a Fortune 100 company once said to me that the biggest quality failure in healthcare is to misdiagnose a patient…everything that follows harms the patient. See Also: To Go Big (Data), Try Starting Small Number 5: Coding can affect the data analysis During a data analysis for a very larger employer, with more than 250,000 covered lives, executives told me they had not paid for a solid organ transplant in a number of years. Based on their size, they should have been paying for about 25 a year. After further detective work, we discovered their consultant was using a DRG grouper that coded all transplants as ventilator cases…who knows why…but a huge error. The benefit team had no idea they were really paying for about 25 a year at an average cost over five years of about $1.5 million each. Number 6: Reversion to the mean One thing we’ve learned from years of claims analysis of big companies’ benefit programs is that if you have enough life years of data, it all looks about the same, i.e., it reverts to the mean. If the workforce is comparatively older, they will have somewhat more high-cost claims.

The Future of Risk

6 Limitations of Big Data in Healthcare

Tom Emerick

Tom Emerick

David Toomey

David Toomey

Read More

Life Insurance Plummets Among Gen Z

January 2026 ITL FOCUS: Life & Health

The New Look for Life Insurance

The Battle for Talent Takes a Twist

Challenges, Opportunities for Insurers in 2026

Key IoT Trends in 2026

Lessons From LA Wildfires, One Year On

4 Key Trends Reshaping P&C Insurance

AI Transforms Insurance Claims Operations

Insurance Embraces Elastic Staffing Model

Top Emerging Risks for Life and Health (Re)insurers

Expert-Recommended Insurance Brokers for Small Businesses in California

What a Next-Gen Insurance Agency Looks Like

Legal System Abuse Drives Up Premiums

IFRS 17 Exposes Attribution Governance Risk

The Insurance Functions AI Chatbots Can't Replace

Life Insurance Plummets Among Gen Z

4 Key Trends Reshaping P&C Insurance

6 Limitations of Big Data in Healthcare

Get Involved

Partner with us

Read More