June 10, 2016
It’s Time for a New Look at Metadata
Metadata does not just represent an arduous maintenance task; it can be a gold mine of opportunity and time-saving.
In today’s search for bigger and bigger big data, I fear metadata is getting overlooked.
Put that way, it sounds ironic. Given that many organizations are almost desperate to mine their data assets for advantage, why would anyone intentionally overlook metadata?
But, in fact, this is nothing new. Thinking back to when I successfully led my first data warehouse project in the 1990s, it was an uphill struggle to make progress on metadata then, too. (Those of a certain age may recall lots of enthusiasm for data warehouses and data mining.) But the less glamorous aspects of data quality management and metadata were all too set aside to save money or hit deadlines.
At its most basic, this is just human nature. We get bored or distracted easily and crave reinforcement or short-term reward to persist with tasks. But metadata just might turn round and “bite you on the bum” if you ignore it for too long.
Let me explain, briefly, why I think this matters:
The domain knowledge gap
Data science and analytics work relies on not just robust coding and appropriate use of statistics but also an understanding of the real world being explored through the proxy of data. Too many projects fail to have any impact in organizations because interpretation or recommendations were naive or irrelevant (which was obvious to those who actually knew what was going on around them). Put simply, metadata is just data about data. Knowing what variables mean really does matter to designing and interpreting analysis.
In fact, metadata can help to get your data scientists or analysts closer to the real data issues as part of their induction. Understanding the data landscape, perennial problems and causes of systemic data quality pitfalls can greatly improve their later analysis. At the very least, the understanding opens eyes to possible data sources and people with expertise.
See also: Data Science: Methods Matter
Short-term-ism always robs effectiveness and often efficiency
Any apparent time-saving (or boredom avoidance) that comes from skipping the work to create/maintain data dictionaries and reference data is normal in the short term. Looking longer-term, you often see repeated work needed or further costs incurred through fixes needed because the initial analysis lacked proper understanding of data item meaning. At the most extreme, findings can be directionally wrong and misleading if built on the shaky foundation of misinterpreted data items.
However, I should also point out that metadata is not just an arduous maintenance task; it can be a gold mine of opportunity and time-saving in the medium term. Information about data that is easily updated by those who use those data items and discover meaning/problems/gaps/workarounds can be not just time-saving but feel life-saving in some cases. Empowering your analysts to share, in a collaborative working ecosystem, the most up-to-date understanding of what each data item means, data quality issues to avoid and any workarounds or other data to use is very powerful.
See also: How to Use All the New Data
GDPR may force a metadata revival
Although this topic has been buzzing around in my brain for months, I was prompted to post about it after reading an article in Data IQ magazine. The flamboyant editor, David Reed, rightly explains that one of the implications of the EU’s General Data Protection Regulation (GDPR) will be a need for better metadata. For permission evidence and to enable rights like the right to be forgotten, data owners/controllers/processors will need data and time stamps for data items. They will also need better records about the meaning of data items and how they were obtained, probably with expiration dates, as well.
So, whether it’s to avoid costly mistakes, help your analysts be more efficient, or to ready your organization for GDPR, please reconsider your need for metadata. It just might be the biggest improvement to your data that you can make.
What about your experience? Have you seen the benefits of accurate metadata, or do you have war stories caused by lack of metadata? Please do share.