3 Benefits of Data Documentation using Data Dictionary, Catalog, and Business Glossary

Anshuman Lall
Predmatic
Published in
3 min readApr 3, 2023

--

Disclaimer: I took the help of ChatGPT as a starting point for this blog :). It turns out, that it took me a lot of time to edit and re-write the auto-generated content. I wonder if it took less time and effort if I wrote my thoughts directly.

I spent over a decade in consulting where I solved business problems using data. In many cases, we partnered with our client to better understand existing data and “create” new data, such as a forecast.

One of the learnings and #1 request from our clients is to document the data and processes.

The “Why”

Here are the top 3 reasons behind this:

  1. The documentation hedge against the risk of losing institutional knowledge when someone (employee or consultant) leaves the company.

2. If the data is not documented, even the team that worked on it might not remember the exact intent and calculations behind certain data.

3. New employees or other departments at the company might benefit from the data documentation.

The “What”

So what data needs to be documented? Here are the three types of data documentation.

When I began my career in data consulting, I didn’t fully understand the differentiation. As I began to understand my audience, I began to understand the importance of different types of data documentation.

Business Glossary: A business glossary is a set of terms specific to a business domain. It serves as a shared vocabulary across different departments within a company.

The business glossary helps to ensure that everyone in the organization are aligned and understands the meaning of business terms consistently.

Data Dictionary: A data dictionary is a technical document. It provides a comprehensive description of the data elements used in a database or other data storage system. It includes information such as the name of the data element (header, column name, etc.), and its description (what it means).

Data Catalog: A data catalog is a centralized tool that provides a comprehensive inventory of data assets across the company. This includes data sets, databases, data pipelines, data models, and data flows.

The “How”

Image from Star Trek Episode Darmok where it was a challenge to establish meaningful communication with an alien species and a universal translator didn’t work.

--

--