Sorry, our demo is not currently available on mobile devices.

Please check out the desktop version.
You can learn more about Stemma on our blog.
See Our Blog
September 14, 2021
June 23, 2021
-
5
min read

How to Make Your Data Catalog Successful

by
Mark Grover
Co-founder, CEO of Stemma
Share This Article

Too many data catalog installs fail at getting adoption. I take a stab at what makes a data catalog install successful with learnings from dozens of companies.

There are only 2 goals that matter when it comes to measuring the success of a data catalog: 1) adoption, and 2) customer satisfaction. If you nail these two, you are successful.

I’m the co-creator of the leading open-source data catalog, Amundsen, which is used by 35+ companies including Instacart, Square, Brex, Asana and many more. In this post, I share key learnings from Lyft, other Amundsen adopters, and Stemma customers on what makes a data catalog install successful.

There are learnings that we have incorporated in Stemma, but this article captures learnings that haven’t been captured in the product yet. These learnings focus on how to launch the product, how to land it for great adoption, and how to measure success.

1. Prioritize a persona and its use-cases

There are many user personas and use cases for a data catalog. Successful installs prioritize which personas and use cases to focus on first. Here’s a simplified view¹ of the most common personas and use-cases for a data catalog. It’s less important which persona you start with first, but more important that you start with a specific target group of users.

Image by author: Most common personas and use cases for a data catalog.

2. Launch in phases

In this section, I’ll dive deeper into best practices for launching your data catalog.

Step 1: Identify a small set of tables to get alpha user feedback on.

  • This set can be most commonly used tables within the company (often referred to as “core” tables) or one domain within the company like marketing, growth or finance, etc.
  • More often, I have seen core tables being the chosen set, partly because they are the most impactful, but also because there’s often a central data team responsible for maintaining them.

Step 2: Populate MVP metadata on these tables.

  • This is where most data catalogs fail. In order for users to get value out of them, descriptions, tags, owners, etc. need to be curated. However, this isn’t sustainable without having an army of data stewards, and this documentation quickly becomes out of date. This is the single biggest reason why data catalogs fail. Avoid this pitfall by choosing an automated data catalog for the majority of data and curate only the most impactful data.
  • Where you must, for tribal knowledge, it helps to do a “docs jam session” with a group of data producers and consumers. You can even offer a reward (like a gift card) for those that put in the most docs!

Step 3: Alpha launch to 5-20 alpha users. 

  • It’s best for alpha users to be ultra-vocal users. These will be from the prioritized persona you chose earlier. These users will become the data catalog’s avid supporters when you launch to a broader audience.
  • Incorporate feedback and iterate. Some types of feedback are super valuable here, like when someone says, “Oh, we already have this metadata in this spreadsheet — we should pull that in here, too.” 

Step 4: Beta launch to all users of the prioritized persona.

  • It’s important to focus your beta launch on your prioritized users (data consumers, for example). One common mistake is to dilute the focus of your launch by opening up to all personas. That doesn’t mean that you should lock out other personas from the data catalog, it just means that you sequence which personas to focus on first. 
  • Graduate to GA if you can meet success metrics targets. More on that in a later section on measuring success.

3. Land for great adoption

In order to get great adoption, here are a few best practices I have seen work:

  • Update Slack channel headers where people ask each other questions. Product features can be super helpful here — for example, if your catalog has Slack integration and can link these conversations to the catalog automatically.
  • Embed into new hire training. Tagging data sets per domain (marketing, growth, etc.) can help new hires quickly onboard to their domains. If you have an existing training, showcase the catalog as an entry point. At Lyft, we had all tech new hires instrument a metric during onboarding. They used Lyft’s data catalog for discovering and understanding the right data for that task.
  • Linkages with other products. Create links between various data tools. For example, auto-populate a link between Airflow DAG that populates a table and the table page in the data catalog (and vice-versa). Another impactful linking is between the table page in the data catalog and a link to the code that is used to generate the table.
  • Showcase the catalog at a group or company meeting. Deliver a short 5 minute demo at an all-hands meeting that targets persona users. Educate, answer questions, and thank your alpha users — it's super impactful by creating more awareness and an opportunity to learn. 

4. Measure success

Like I said earlier, adoption and customer satisfaction are the only two goals that matter. I dig further into what specific metric definitions to use for each of them:

1. Adoption:

  • WAUs: I’d suggest starting off with Weekly Active Users (WAUs) instead of Daily Active Users or Monthly Active Users. Common usage frequency is weekly, not daily or monthly.
  • Target Penetration rate: 80%. A great penetration rate is 80% within your target persona.

2. Customer Satisfaction (CSAT):

  • Measure out of band periodically. In my experience, out of band (not in product), CSAT feedback measured periodically (every 3 or 6 months) is better than getting feedback within the data catalog product. I have learned that when feedback is measured in the product, the most recent experience can tarnish the feedback shared by the user.

There are a few other metrics that companies often consider: documentation quality, search quality, etc. However, my recommendation is to stick to the core metrics at the onset. As your data catalog matures and you ingest more metadata into your data catalog over time, you can track those specific metrics to track the impact of those various improvements.

I hope this step-by-step guide helps inform you and your team as you navigate your data catalog install and makes your data catalog successful. The right data catalog can greatly reduce the overhead of curation. However, the above steps still play a huge role in ensuring your success, regardless of the data catalog you choose.

Want to learn more about Stemma’s fully managed data catalog? Reach out to Mark Grover or the team at Stemma.

¹ Another relevant persona is a product engineer, who often needs to put this metadata in the right places so a downstream catalog can pick it. Use-case list is not exhaustive.

Share This Article
Data Discovery
Trust
Stay in the loop by subscribing to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Next Articles

September 15, 2021
September 15, 2021
-
min read

Data Discovery in Data Mesh

Why is data discovery important? What is the role for data discovery in data mesh? Who's responsible for making data discoverable? Learn the answers to these questions (and more!) — summarized from a recent panel discussion on Data Discovery in Data Mesh.

September 14, 2021
December 1, 2020
-
7
min read

The data production - consumption gap

All recent innovation in data has taken place in two areas — helping data engineers produce data, and helping data consumers (primarily data analysts and scientists) consume that data.

September 14, 2021
March 18, 2021
-
5
min read

Amundsen deployment best practices

Almost any organization using Amundsen will need to make custom changes to their install. Unfortunately, this has been a long-time issue for the community. This post is the first in a step-by-step guide to getting a fully customized enterprise deployment of Amundsen¹, based on how Stemma deploys Amundsen for its customers.