Sorry, our demo is not currently available on mobile devices.

Please check out the desktop version.
You can learn more about Stemma on our blog.
See Our Blog
close icon
March 2, 2023
February 28, 2023
-
5
min read

Stemma announces its Data Developer Update for improved data pipelines with dbt

by
Rob Lundy
Head of Product Marketing
Share This Article

Update includes features built on column-level lineage and an integration with dbt Cloud 

The Lineage Graph manages complexity with features like Focus Mode

We are proud to announce that we are releasing the Data Developer Update to further improve the way our dbt users manage their data development lifecycle.

The Data Developer update includes:  

  • The Lineage Graph for visualizing dbt model dependencies in real-world, complex environments that have many tables, columns, and dashboards 
  • Integration with dbt Cloud to automatically gather metadata, documentation and lineage information for use in Stemma 
  • Column “Autodescribe” which allows teams to document columns once in dbt and propagate the name via column-level lineage to derived tables 

The Lineage Graph for increased confidence while developing with dbt

With Stemma's column-level lineage we can easily trace data back to its source and validate/investigate any cleaning or transformation actions along the way to its end state. We used the feature to quickly assess lineage and impact after users reported mismatches with a transformed column in our EDW. We see it being quite useful for creating or enhancing data assets as well since it allows us to easily scope the upstream and downstream dependencies down to the column-level of detail.

Jordan East
Manager - Data Engineering, Analytics Engineering, Enablement
Workrise

A common use of Stemma is to build upon the lineage found in dbt docs. The lineage view in dbt docs shows table-level relationships and downstream artifacts defined using exposures. As dbt projects scale with data and organization growth, the number of models and exposures invariably grows and can be harder to audit. Also, many use-cases require column-level lineage in addition to table level lineage. This leads many dbt users to add an automated lineage tool to their dbt workflows to automatically discover and document these relationships.

However, the next challenge these users face is the complexity of their own environment. While many tools and catalogs show clean lineage in screenshots and demos, they become difficult to use in real-world environments that have complex relationships between many tables that each contain many columns. 

We designed the Lineage Graph to cut through real-world complexity using:

  • Column-level filtering to restrict the view to column-to-column relationships
  • Chart-level filtering to filter upstream dependencies at the level of an individual chart within a dashboard
  • Focus Mode to make it easier to follow table-level relationships while retaining the context of the larger data environment
  • Enriched node information to make it easy to see when each of the dependencies were last refreshed
Column filtering restricts the graph to column-level relationships

With the Lineage Graph, dbt users can speed up model iteration. A typical scenario for a dbt user is:

  1. The data team determines they need to change a column type in a dbt model 
  2. The team uses Stemma to clearly see which tables and dashboards have been built on that column
  3. The create new column in dbt
  4. They use Stemma to alert users and downstream dependency owners to the change and recommend they use the new column going forward
  5. Where possible the team changes existing downstream dependencies to use the updated column
  6. Use Stemma to confirm that all dependencies have moved to the new column

dbt Cloud integration

Our development cycle is much faster now that we use dbt Cloud and Stemma. We use dbt Cloud’s IDE and templates to easily write queries, and to seamlessly test and deploy our ELT pipelines. Stemma automatically detects and surfaces the downstream tables and dashboards for our dbt models, which gives us the confidence to iterate and update them quickly.

Chong Sun

Senior Director, Engineering
Tempo

Stemma has integrated with dbt Cloud to create an even more automated data development experience. Stemma connects with dbt Cloud’s API to automatically gather table and column descriptions, table tags, compiled queries, and model types from dbt and add them as context to tables in Stemma. It also gathers lineage information from dbt Cloud and adds it to Stemma’s own query parsing to infer reliable table-to-dashboard and table-to-model lineage.

With Stemma’s integration, all of the data changes made by dbt Cloud are automatically reflected in the Lineage Graph where users can visually understand the relationships and dependencies for their dbt Cloud generated model. dbt Cloud users gain even more efficiency from the new “Autodescribe” feature explained in the next section.

Stemma's integration with dbt Cloud enables data teams to confidently and quickly develop their data models. Analytics engineers can rapidly construct pipelines and expertly handle production workflows with dbt Cloud, while simultaneously taking decisive action based on dependencies and usage statistics obtained through Stemma.

Nikhil Kothari

Head of Technology Partnerships
dbt Labs 

“Autodescribe” allows dbt users to document once and then propagate via lineage to derived assets 

Stemma’s Autodescribe feature adds context to columns that lack descriptions

With the new “Autodescribe” feature, Stemma uses column-level lineage to detect whether a column has an upstream dbt model that contains documentation. This documentation is automatically added to the column as a description. Autodescribed documentation is easily identifiable and can easily be overwritten using the rich text editor on the table details page or updating the description in the dbt model.

Teams will see efficiency gains from Autodescribe depending on their documentation and data development practices. Notably we have already seen multiple customers automatically propagate documentation to tens of thousands of columns with this feature.

Autodescribe is part of a broader user experience redesign in the Data Developer update. In addition to improving automated documentation, the “Discover” and “Browse” experiences have been redesigned to make it easier to onboard new users to the data assets available to them. Gathering context for a table is now more intuitive for self-serve users. 

In the new UI, context is organized into tabs like user behavior and Slack conversations under the Usage tab

The contents of a table’s detail page are now organized into tabs by the type of question most teams ask of the table.  For example, tribal knowledge gathered via Stemmas Slack integration is now organized under the Usage tab.  The new organization for table description tabs and the questions they intend to answer are:

  • About - what is this data?
  • Lineage - where does this data come from, and what other data is it feeding?
  • Usage - who is using this data and how are they using it?
  • Additional Fields - custom information for your company

Manage your data development lifecycle with Stemma and dbt

dbt is the data practitioner’s tool of choice for transforming data in the warehouse.  With Stemma, data engineers and analytics engineers can cut through the complexity of real-world environments to quickly and confidently build on dbt pipelines while making sure data and documentation are discoverable by end-users. 

Stemma is the modern data catalog for self-serve cultures, used by companies like Grafana, iRobot, SoFi, and Convoy. Stemma was built by the creators of Amundsen, the leading open-source data catalog used by Lyft, Instacart, Square, ING, Snap and many others. If you want to see the Data Developer Update in action or if you just want to learn more about Stemma, feel free to schedule a demo.

Share This Article
Oops! Something went wrong while submitting the form.

Next Articles

November 9, 2022
June 21, 2022
-
4
min read

Balancing Proactive and Reactive Approaches to Data Management - Part 1

Data management is best handled by balancing proactive and reactive methods

November 9, 2022
October 7, 2021
-
min read

3 Steps for a Successful Data Migration

Learn the 3 crucial steps that great data engineering teams follow for a successful data migration.

November 9, 2022
March 9, 2022
-
min read

Stemma vs. Amundsen

Learn how Stemma is different from Amundsen