(note: this is an update to the post from 10/27/22)
Over the past two years, the team at Stemma has been working hard to create a more accessible and automated catalog experience that extends beyond Amundsen. To that end, Stemma has made investments in three key areas: providing a friendlier experience for analysts and business users, improving productivity through lineage, and simplifying catalog setup and administration.
Here are a few of our latest updates in each category:
Friendlier for analysts and business users
- Homepage has been restyled as the “Discover” page. Tables and dashboards are now organized by source and it is easier to get into the relevant browse mode.
- Table and dashboard details pages redesigned to organize granular metadata into functional categories: About, Lineage, Usage, History, & Additional Fields (custom fields).
- Table Usage tab displays granular user and query information.
- Table History tab captures updates to schema, documentation, ownership, & certification status.
- Asset Popularity automatically detected from query history and surfaced to users.
- Certification statuses better indicate how data should be used and how trustworthy it is. Certified tables are given improved rankings in search results, and intermediate & deprecated tables are hidden by default.
- Stemma’s Business Glossary provides a dedicated place to document important terms across your business, assign ownership to those terms, and link them to relevant catalog assets.
- Stemma’s improved PageRank style search better prioritizes the most relevant results, even when using fuzzy search terms.
- Improved Search context: Search automatically traverses table and column documentation, and highlights to users why specific results are returned.
- Search updates in realtime to reflect catalog changes.
- Improved 'Advanced Search' with granular search facets and automatic facet detection.
- WYSIWYG description editor enables easy formatting for your catalog documentation, including code blocks with automatic syntax highlighting and the ability to embed images.
- Slackbot to easily capture conversations in Slack and link them to tables in the catalog.
- Welcome Message gives users context about how to use your org’s data and where in the catalog they should start.
Improved productivity through lineage
- Autodescribe will automatically propagate column descriptions via column-level lineage to any derived columns that do not already have their own description.
- Tabular Lineage now provides a more detailed view of upstream and downstream relationships.
- Visual Lineage Graph has been carefully redesigned to make exploring the lineage of important assets simpler and more intuitive.
- Analysts can quickly see the upstream relationships of a given table or dashboard, as well as when they were each last updated.
- Data engineers can easily find downstream dependencies to see which tables and dashboards are likely to be affected by a data change and can plan their work accordingly.
- Stemma allows catalog users to communicate with owners of downstream tables and dashboards prior to a data change directly from the table details page.
- Lineage graphs are now filterable by individual columns, making it easy to see which dashboards use a given column.
Easier to install, integrate, and administer
- Stemma’s GraphQL API allows programmatic retrieval and management of catalog information. Now you can more easily update information and assign owners in bulk, or build metadata into your CI/CD process to enforce responsible ownership and documentation policies.
- Admin panel with automated reporting offers insight into the best places to add documentation and to monitor active users.
- Automated table and column-level lineage through query parsing.
- Automated detection of commonly joined tables through query parsing.
- Out-of-the-box search with no need for Python or Elasticsearch tuning.
- UI-based bulk editing to efficiently apply descriptions, ownership, and other metadata to assets.
- UI-based connection interface allows you to manage your catalog integrations even more easily and securely without needing to write Python databuilder jobs.
- SaaS deployment with SSO, auditing, & backup/restore capabilities.
- Databricks Unity Catalog integration automatically gathers the time that the table was last updated, any partitions that exist, and lineage between views.
- dbt Cloud integration automatically gathers table and column descriptions, table tags, compiled queries, and model types from dbt.
For more details, check out our new Docs Site which features a friendlier landing page with quick links to docs based on your role.