With the highs in Austin now comfortably under one hundred degrees for weeks at a time, it’s clear that fall is around the corner. One of the more interesting summer activities I am reflecting on has been talking to Heads of Data and Analytics Managers who have embraced the Modern Data Stack. While there were many differences between the companies, reporting structures, and backgrounds, they all managed to blend technical problem solving with consultative empathy. What I learned from talking to them is that Data is as much about integrating different people’s goals, expectations, and thought processes as it is about integrating tools and protocols. I have also noted a few trends that I want to share here.
What we do in the shadows
To start, Data leaders feel particularly under-resourced because very few people on the business side comprehend the complexity the Data team has to resolve. By contrast, as overloaded as modern Developer teams are, there is more understanding and sympathy for their work. Part of this is a natural understanding for how difficult the work must be - “yesterday there was no way to automatically detect topics in a transcription, but today there is. I could never make that happen.” It is also because the impact of the work is immediately felt by the business.
Sales keeps an eye on Dev because they can re-engage and upsell customers when new features come out. As a bonus, if their company launches the feature first, they get a leg up on your competitors. For these reasons, developers bear the hopes and pressures of momentum in a software-enabled business. This gives them leverage when prioritizing the increasing amount of work that tries to shift its way left onto their plate. At many companies, one of the casualties of the Dev team’s triage is pre-planned data.
The release cadence remains king until the executive team decides to make a data-informed decision. It’s at this point the Data team finds themselves between the foot and the gas pedal. For business leaders it is hard to understand why the data is simply not there. “The product was built with software and logged by software, how can we possibly not have the data?” For leaders in Data, who spend half of their time closely following the changes in Product that will impact pipelines, and the other half understanding the reporting needs of the business, this can feel like the beginning of a Cassandra complex. How possibly can Data have the finger pointed at them when their concerns were set aside at the outset in favor of a speedy release?
If you can’t beat ‘em, join ‘em
One possible solution on the horizon comes from the shifting role of the Data Engineer. Multiple companies have mentioned that Data Engineers who are 70% technical and 30% consultative are moving within Software Engineering or else being staffed by talent with software backgrounds. This shift marks more attention to Data within the Engineering department itself. That’s a good thing, right? This is sometimes coupled with the rising role of Technical Data Product Manager, which, like other Product Manager roles, scopes the interests of the business and manages efforts across teams to drive timely and valuable results.
Meanwhile, the other group of Data Engineers, those who are BI consultants first and engineers second, are increasingly becoming Analytics Engineers. With dbt, the Analytics Engineer can create pipelines for the business units while carrying the torch of Data Engineering. Notably, Analytics Engineers carry out their work with the reliability and repeatability of data in mind. This is in contrast to work by Analysts who tend to prioritize the immediate need to discover and report a compelling insight to business leaders, while leaving issues around the durability of the data to be figured out later (by the Data team).
The self-serve wild card
The third major trend is the move toward “self-serve data”. I haven’t seen any example where this idea came from or was favored by the Data team. Largely this is because self-serve can be seen as a loss of control for the team. That thing where Analysts prioritize insights over reliability? Self serve without guardrails can mean a lot more “figure it out later” that falls on the Data team as unplanned work.
For these teams data discovery and a well documented catalog is increasingly important. This is where “productizing data” along the lines of a Data Mesh concept becomes important. Control is achieved by providing clear awareness of what data should be used and when to data consumers at the point where they would find and access it. This is part of an overall strategy of “over-communication” by the teams who have learned to adapt to the self-serve model.
There are knock-on benefits to this approach. Teams who have focused on setting up a catalog have found that earlier adoption has led to easier documentation. The longer a team goes without a catalog, the more context is lost to time and to the scattered records of other tools and communication channels. This is important because the biggest benefit of a catalog to these companies is in freeing up the time of the most tenured and competent Data Engineers who are otherwise buried in ad hoc requests and office hours.
It has been a pleasure learning from people who really love data and enjoy the challenge of solving hard problems that have a huge strategic impact on the business. If your experience is different from what I have described here or if you have a compelling story to share, I would love to hear from you too. If your team is trying to adopt a self-serve model, you are not alone. Check out Stemma and see if we can help you guide your users to reliable data.