Over the last few years, Convoy's data, as well as employees who use this data day-to-day, have grown rapidly. Due to the rapid growth of Convoy, it was extremely challenging for Convoy's data users - both data scientists as well as business users - to discover what data exists and if it could be trusted. Chad Sanderson, the head of Product, Data Platform at Convoy, explained
"Convoy's data scientists are hyper valuable resources, producing models, doing analysis, figuring out if products are working and they were spending an enormous amount of time trying to just find the data they needed to create queries. That was preventing the data scientists and hence the company from moving faster and delivering more value for its customers."
Sanderson and the team at Convoy undertook an internal user research project to understand the biggest detractors for Convoy's data scientists. Sanderson shared
"What we learned was that the biggest detractor by far was finding and trusting the data. Data Scientists would go in this maze, browsing through Snowflake, searching in Metabase and asking around. It was a huge distraction."
The severity of the problem varied across teams from painful to show-stopping. Sanderson further explained
"The teams that wasted the most time in discovering data are exactly the ones that you can't afford to because they were working on things in growth mode, and they really need to be spending the majority of their time analyzing and modeling to help the product grow further and triage business issues."
Stemma is able to fully deliver on Convoy's data discovery and cataloging needs.
Convoy first tried solving this problem a few ways before giving Stemma a try. Convoy's first attempt was documenting data in their internal wiki but it got quickly out of date. Then, Convoy tried using dbt docs but to limited success. Lastly, they created a Slack channel where users could ask questions. The Slack channel helped but it took a few days to get a response and the barrier to entry still stood tall.
Sanderson elaborated"It was super common to run into a bottleneck with our Slack channel. That slowed down product development because if you aren't able to answer relevant questions that a product is trying to solve, you can either carry forth with insufficient information and potentially make a wrong decision, or you'd just have to wait for someone's answer. If everybody is waiting for a few days for every answer, the aggregate cumulative time spent waiting for Slack conversations to manifest, it is a massive slow down for the company."
When Convoy evaluated product solutions for this problem, it got a huge vote of confidence from Remitly, another Seattle-founded company that is a heavy user of Amundsen, the open-source precursor to Stemma. Remitly spoke very highly of the impact their data catalog had there and that led Convoy to get started with Stemma.
Convoy shared it with a small alpha user group of data scientists. Stemma got some extremely high praise. Sanderson explained
"The data scientists really loved the data catalog - the user experience was superlative, the ease with which it integrated with Convoy's cloud infrastructure was excellent and the automated metadata really delivered value out of the box."
Shortly after, their data catalog was made available to everyone at Convoy.
Sanderson explained "Not only has Stemma cut out a lot of Slack conversations, it has allowed us to expedite the data science workflow. Instead of asking questions on Slack about what data exists, who owns it and how to use it and having to wait, you can just open Stemma and look for the automated metadata yourself."
While the data science team is a very heavy user of Stemma and rated it as one of most valuable data investments made in 2020, Stemma is also more broadly used by Convoy's employees, including the operations team. Sanderson mentioned that
"More business-focussed operations associates are able to use Stemma to easily find data they need. This has allowed us to lower the barrier to entry for data use within Convoy."