Sorry, our demo is not currently available on mobile devices.

Please check out the desktop version.
You can learn more about Stemma on our blog.
See Our Blog
close icon
November 16, 2021
November 16, 2021
-
min read

How to evaluate a data catalog

by
Grant Seward
Founding Engineer
Share This Article

The data catalog is becoming a ubiquitous part of the data ecosystem, so much so that the new data stack is now known by its three pillars: the warehouse, the BI tool and the data catalog. However, many organizations still struggle to wrangle their data in a way that allows users to answer the most basic questions: what data exists and what does it mean? 

Taking the next step to evaluate and procure a data catalog is a daunting step. There are a plethora of options available in the market and historically many catalog implementations have failed due to a mismatch between the product and the company. We’ve identified the five pillars to selecting the right data catalog for your organization; by tackling each of these items you will be able to de-risk your data catalog implementation and start getting more value out of your data.

1. Alignment with data culture

The most important aspect of selecting a data catalog is to choose the product that is best positioned to enable your data culture. While the catalog can support your team’s culture but it cannot create on its own. 

There are many different data catalogs and just about every one provides the full set of capabilities that are expected from a data catalog: indexing metadata, search, descriptions and tagging, classification and more. Rather, the main difference between data catalogs is whether they expect to work within your culture or if they expect to define your culture. These differences can be succinctly summarized by grouping catalog providers into two categories: traditional and new-age. 

A more traditional data catalog will support cultures that require:

  • More process and approval for documentation
  • Integration with legacy systems
  • Dedicated roles for data stewards to keep documentation updated

New-age data catalogs are better suited for:

  • Augmenting workflows with automation that prompt users for input when required
  • Integration with the modern data stack - Snowflake, BigQuery, Redshift, Mode, Looker, Tableau, Slack, GitHub, Slack, etc.
  • Providing automated insights into who is using your data and how

2. Identify your success criteria

Selecting a data catalog is an ongoing investment for your organization and, as with any investment, you should attempt to identify and maximize your likelihood of success. Determine the outcomes that your company needs to have in order for your data catalog to be successful. Some examples here may be:

  • Build trust in your data among the analysts and other end users
  • Allow data scientists to be onboarded and to ship their first feature faster
  • Provide centralized data steward / governance team controls to manage metadata quality
  • Reduce the time to research the impact to your dashboards after data pipeline failures
  • Eliminate all duplicative metrics and reports across your BI tool(s) and data teams
  • Ensure all certified data assets are documented

These outcomes can be mapped directly to the features that catalog providers have allowing you, in turn, to directly compare data catalog provides based on the results that are important to you.


3. Review the options 

There are many factors that will go into determining whether a catalog will be a good fit for your organization. The number of data users in your company, the technology in your data ecosystem and security requirements for integration are only a few of the first-order considerations. Identify the options in the market that you believe are best suited for your company. Take into account the product's current offering as well as the support and roadmap behind the product. The needs you have for a catalog today will likely shift and change within a few years; you'll want a catalog that evolves as quickly as your business.

We know that it can be difficult to come up with the criteria that should be used to evaluate different catalogs. After working with hundreds of companies we've put together The Ultimate Data Catalog Checklist in order to help you kick-start your evaluation process.

4. Support, guidance and engagement

The data catalog can only be as useful as the information put in and the processes placed around it. A key factor to the initial success of a data catalog installation is how users are onboarded and how data is bootstrapped into the catalog. Catalog providers should be willing to provide training, hands-on guidance and support that are tailored to the nuances that are unique to your data and your team’s culture.

To get a sense of the support you will receive from a particular provider, ask to speak to the individuals who will support you while you’re a customer. The support and implementation team should be familiar with the tools you are using and should deeply understand the problems you are trying to solve with a data catalog. Be on the lookout for support engineers that are highly technical and who have previously walked in your shoes; this is a positive sign that the data catalog provider is willing to invest in your long-term success!

In addition, look for catalog providers that have extensive experience supporting companies that are similar to your own. The challenges and experiences from a catalog implementation vary widely due to the size and culture of each company, therefore the lessons learned by data catalog support teams will be dramatically different. 

Still can't decide? Run a proof of concept

After you have vetted the data catalogs you believe are the best fit for you, if you still cannot select a single data catalog provider then it may be worthwhile to run a proof of concept. Selecting the data catalog that is right for your organization is a risky decision and historically many implementations have failed due to a mismatch between the catalog and the users. The best way to de-risk the procurement and implementation of a catalog is to run a proof of concept to ensure that capabilities and features of the catalog align with your user base’s expectations. 

Select the one or two catalogs you feel are best for you and work with those providers to clearly define the timeline and the outcomes you want to achieve during your proof of concept. We recommend that the POC run for about 2 weeks in order to allow your users to fully explore the features and to get a sense of how the tool fits into their every-day work. During the POC you should expect the full support and engagement from their onboarding team; this time should be used to evaluate both the product as well as the support.

Share This Article
Stay in the loop by subscribing to our newsletter
Oops! Something went wrong while submitting the form.

Next Articles

September 15, 2021
September 15, 2021
-
min read

Data Discovery in Data Mesh

Why is data discovery important? What is the role for data discovery in data mesh? Who's responsible for making data discoverable? Learn the answers to these questions (and more!) — summarized from a recent panel discussion on Data Discovery in Data Mesh.

October 4, 2021
October 4, 2021
-
min read

Making Sense of Metadata Ingestion

One of the early questions that data engineering teams pose when implementing a catalog is: should we make the catalog responsible for gathering metadata from data systems ("pull"), or task data systems with reporting metadata to the catalog ("push")? And, what are the consequences of using one approach over the other? Learn how to ingest metadata into your catalog and which method to choose.

October 7, 2021
October 7, 2021
-
min read

3 Steps for a Successful Data Migration

Learn the 3 crucial steps that great data engineering teams follow for a successful data migration.