Sorry, our demo is not currently available on mobile devices.

Please check out the desktop version.
You can learn more about Stemma on our blog.
See Our Blog
close icon
August 17, 2022
August 17, 2022
-
min read

Open standards are not the answer

by
Mark Grover
Co-founder, CEO of Stemma
Share This Article

Earlier this week, dbt Labs CEO Tristan Handy wrote a post making the case for new standards to solve the metadata problem. My goal here is to continue the conversation. In Tristan's words:
"In a sufficiently complex organization, it is not good enough to find a table called customers in your warehouse—you need to know how it was produced, who built it, when it was updated, etc. in order to make use of it."

I could not agree more with this premise. The Modern Data Stack has led to two phenomena:
a) Reducing the barrier to entry for producing and consuming data
b) Creating best of breed products

These in turn lead to a lot more data but also a lot more chaos. More chaos means more questions like the ones below, referred to as ABC of metadata1:
Application Context - Where is the data? What are the semantics of the data?
Behavior - Who is using the data, who created the data?
Change - How has this data evolved over time?

Tristan then describes the real barrier to solving this problem, which I also agree with:
"the hard problem is: how do you get an entire ecosystem of vendors to build products that answer these questions?"

Now, the point that I disagree with is the following:
"It will either be solved by product integrations in an open ecosystem (requiring standards) or via commercial consolidation."

There is a third way - the integration service.

Let’s take an example. The average organization today has dozens of SaaS applications2. These applications need to send data to the data warehouse, which we use for understanding, running, and evolving our businesses. However, these SaaS services are neither consolidated, nor do they use a widely-adopted standard. This is true for Salesforce, Marketo, or whatever application commonly sends data to the warehouse. There are data integration services, like Fivetran and Airbyte, that have hundreds of connectors that extract and load your data from disparate SaaS services into the warehouse. Not only that, now there are vendors that reverse ETL your data from the warehouse back into these same SaaS applications.

This activity is not just happening between SaaS applications and your warehouse. For example, Segment, a customer data SaaS, instruments and ingests data from disparate web and mobile sources.

So, the interesting question is what circumstances lead to commercial consolidation vs. open standards vs. integration services?

There are at least two factors:
- Maturity: less mature products tend to be focused more on their core product use-cases and less around integrations
- Purchasing patterns: Do users involved in making a purchase decision value adherence to a standard, or do they just care that integration is handled and effective?

Bringing these back to the metadata problem:
- Maturity: The modern data stack is less mature than major SaaS products I mentioned earlier. This means that more often than not vendors will be focused on improving and scaling their core use-cases in the near term.
- Purchasing patterns: More often than not, users want effective integration but do not care how it is done. They do not care about adherence to a particular standard and are open to integration services.

I have a biased view3 but my prediction for solving the metadata problem is that it will follow a similar pattern to what we followed for the data problem. Just like Fivetran and Airbyte act as data integration services, there will be metadata services that will ingest, aggregate and recommend based on metadata from across multiple products and to enhance the overall experience for the data users.

Only time will tell who is right. If you think there are other important factors besides the two I mention, I would like to hear them. What do you think?

[1]: Terminology from Ground, A Data Context Service

[2]: Statista, Average number of software as a service (SaaS) applications used by organizations worldwide from 2015 to 2021, accessed August 16, 2022

[3]: I am the co-founder of Stemma, which is a metadata integration service (aka data catalog). Prior to that, I co-created Amundsen, which did just that at Lyft.

Share This Article
Stay in the loop by subscribing to our newsletter
Oops! Something went wrong while submitting the form.

Next Articles

September 15, 2021
September 15, 2021
-
min read

Data Discovery in Data Mesh

Why is data discovery important? What is the role for data discovery in data mesh? Who's responsible for making data discoverable? Learn the answers to these questions (and more!) — summarized from a recent panel discussion on Data Discovery in Data Mesh.

October 4, 2021
October 4, 2021
-
min read

Making Sense of Metadata Ingestion

One of the early questions that data engineering teams pose when implementing a catalog is: should we make the catalog responsible for gathering metadata from data systems ("pull"), or task data systems with reporting metadata to the catalog ("push")? And, what are the consequences of using one approach over the other? Learn how to ingest metadata into your catalog and which method to choose.

October 7, 2021
October 7, 2021
-
min read

3 Steps for a Successful Data Migration

Learn the 3 crucial steps that great data engineering teams follow for a successful data migration.