The Data Renaissance Stack

Introduction

Over the last few years we’ve often heard the term Modern Data Stack (MDS) and immediately thought of certain tools which may relate to that and how they fit together, but there is a bit of variation when it comes to defining what exactly it is. To be honest, I think the term itself is not a great one.

I think we can all agree what a data tech stack may be; a collection or combination of tools that together can store, move, process and expose data.

But what makes it modern? Surely including the latest, trending tools will make it modern?

This is why I’m not a fan of the term. We need something else to describe this huge shift in the way data solutions have changed.

The ‘Modern’ Data Stack vs The ‘Legacy’ or ‘Traditional’ Data Stack

A bit of research shows that a modern data stack should be cloud oriented, scalable, simpler, more flexible, designed for non-technical users. But isn’t this just a natural evolution of tech products?

Research more and you’ll see categories of tools such as Orchestration, Observability, Data Warehouse, Data Integration, Reverse ETL and again I ask myself: Aren’t these things just a different categorisation of products, there are many traditional tools which could function as these?

What really is the significance of the ‘Modern’ Data Stack?

In my opinion, its happened in the period between ~2010 and now (2024) - it boils down to the evolution of 3 key changes within the landscape:

Architectural Awareness

Concepts of data architecture are much more prevalent in data teams and modern tools have compensated for that.

Automation and cloud-based solutions reduce the need for hands-on architectural design and maintenance tasks.
An increase in specialised data roles (data engineers, data scientists) have led to the distribution of tasks which were usually handled by an architect.
The adoption of data ops means data management tasks are approached collaboratively using these tools.

Architectural Influences

Software engineering has influenced data engineering. The benefits of microservices have been proven and in turn this has influenced more decoupled solutions to data tools. It’s also brought CI/CD, Version Control, Containerisation, Orchestration and more.

Storage and compute are separate entities leading to greater flexibility and control over data.
Transformation is no longer associated with extracting and loading and has inherited source code management.
Each tool in a data stack is now modular and can be deployed independently and interface easily.
Pipelines are composable, tools have the flexibility of being configurable components orchestrated together.

Architectural Innovation

This relates to the influences above as now the architectures can be remodelled. Data Mesh is an architectural paradigm created in 2019 which promotes a…

‘...decentralised socio-technical approach to share and manage analytical data in complex and large scale environments within or across organisations.’

In simpler terms it means that:

Organisational domains have ownership of operational and analytical data.
That data is a product which can interface with other domain data products.
There is a centralised data platform that these products run on.
Across all of the domains, there is a federated governance principle which sets standards and policies to facilitate a data ecosystem.

You can find out more about Data Mesh at https://www.datamesh-architecture.com/ (opens in a new tab)

Final thoughts

There are some huge shifts which have produced what we now call the ‘modern’ data stack. A renaissance period if you like? Everywhere we look there is this fragmentation of responsibilities. There are more tools, which are more focused on specific areas, doing a more efficient job.

This may sound like a bad thing, an overwhelming and complex spaghetti of tools. But, with the right set of requirements and a good knowledge of the landscape, choosing a tech stack or even migrating a tech stack has never been easier. Gradually added components means that company leaders can predict the costs of their data stack as it develops and plan for changes to come.

I think what isn’t considered in all of this usually, is how everything outside of the technological advances has changed. Including the new mindset of having business and data even more closely related, with domain driven data teams and hybrid roles like analytics engineers.

These things will grow and evolve and there will be further influence from AI which again will further shift more technical tasks onto the business. All of this eventually means that data is adding so much more value to organisations already and will continue to do so even more.