2024 Data Engineering Trends and Predictions
2024 Data Engineering Trends and Predictions
V. Additional Resources
Introduction
2024 will be the year of the data engineer.
The data space moves fast. If you don’t stop and look around
once in a while, you just might miss it.
When it comes to the future of data, a rising tide lifts all ships.
And the value of data will continue to rise in 2024, raising the
standards—and priorities—of the data industry right along with
it.
In this eBook, we’ve analyzed the data landscape and put the
spotlight on 11 of the biggest trends poised to impact data
engineers in 2024.
Reliably yours,
Lior Gavish
Organizational
I. A Forward From Lior Gavish
A renewed focus on data engineering
Trends
II. Organizational Trends
Data contracts
Cost optimization
Data reliability engineering
Data teams will look like software teams
V. Additional Resources
Data Contracts
Now a little wiser and more mature, data contracts have actually
seen adoption at enterprise-scale, with companies like Whatnot
and GoCardless leading the charge. As data and AI products
become increasingly more dependent on high quality data to
succeed, considering how to enforce governance standards
upstream will be absolutely critical.
Practical insight:
One of the fears of implementing data contracts is that there will
be a lot of upfront work to define schemas that will just change
anyway. However, what we’ve found with our current user base
is that their needs don’t actually change frequently. This is an
important finding because currently even one schema change
would require a new contract and provision a new set of
infrastructure creating the choice of going greenfield or
migrating over the old environment.
As Harvard Business Review puts it, chief data and AI officers are set
up to fail. As of Q1 2023, IDC reports that cloud infrastructure
spending rose to $21.5 billion. According to McKinsey, many
companies are seeing cloud spend grow up to 30% each year.
The good news? There are a variety of ways you can reduce cloud
storage and compute costs using the the tools already in your data
stack. See below for a few popular approaches:
And those just scratch the surface. Check out our new guide, 21 Ways
to Reduce Your Cloud Data Warehouse Costs, to learn more.
Data Reliability Engineering
Included in our trends guide for the second year running, data
reliability engineer is an increasingly important job role with the
responsibility of helping an organization deliver high data availability
and quality throughout the entire data life cycle, from ingestion to
end products: dashboards, machine learning models, and production
datasets.
Data reliability engineers often apply best practices from DevOps and
site reliability engineering such as continuous monitoring, incident
management, and observability to data systems.
Since data is rarely ever in its ideal, perfectly reliable state, data
teams are hiring data reliability engineers to put the tooling (like data
observability platforms and data testing) and processes (like CI/CD)
in place to ensure that when issues happen, they’re quickly resolved
and impact is conveyed to those who need to know.
As the data & AI space evolves, we anticipate this role (and others,
like analytics engineer) evolving into “AIOps” or “MLOps” reliability
engineer.
The most sophisticated data teams are viewing their data assets
as bonafide data products—complete with product
requirements, documentation, sprints, and even SLAs for end-
users.
Industry Trends
I. A Forward From Lior Gavish
A renewed focus on data engineering
V. Additional Resources
Return to Office
Nearly four years out from the beginning of the pandemic, the tide
toward working from home is beginning to change. Many
companies are requesting that employees return to the office at
least a few days a week. According to a September 2023 report by
Resume Builder, 90% of companies plan to enforce return-to-office
policies by the end of 2024.
Since the phrase Modern Data Stack hit the scene in the late
2010s, it’s largely been defined as a data platform that is:
cloud-based, modular and customizable, best-of-breed first
(choosing the best tool for a specific job, versus an all-in-one
solution), metadata-driven, and runs on SQL.
But regardless of how the modern data stack plays out, we’ll
likely experience some growing pains as we adapt to the needs
of this rapidly unfolding shift in our day-to-day as data
engineers.
But like everything that finds itself caught in a hype cycle, the
legend of the data mesh found itself bigger than the examples of
teams actually doing it successfully. That’s because while data
mesh is no doubt a valuable theory, it relies on a mix of tooling,
processes, and organizational change to be successful. And that
latter portion especially doesn’t happen overnight.
Fortunately, data mesh isn’t all smoke and mirrors. The theory is
sound, and the value proposition is spot on for many data teams.
Where 2023 took a sober look at the real-world implications of
reflection on the massive undertaking
The general consensus is that data analysts are going to form the
bedrock of your self-service and data democratization efforts. It’s
important they are or become fluent in writing SQL. They need to
be both equipped and empowered to answer the questions they
field as they sit closest to the business.
V. Additional Resources
Apache Iceberg Gains Popularity
For the past few decades, most companies have kept data in
an organizational silo.
Modern data teams that productize their internal data will need
to abide by several key approaches, including gaining early
stakeholder alignment, taking on a project management
mindset, investing in self-service tooling, prioritizing data
quality and reliability, and ensuring your structure supports
your data organization.
Data quality has been a known issue for some time now. But as
the data observability category develops and data quality
becomes a critical priority for data engineering teams seeking
to drive business value, data observability is quickly becoming
an indispensable layer of the data stack.
For the third consecutive quarter, Monte Carlo was named the
#1 Data Observability Platform by product review site G2. And
since G2 is powered by real user feedback and ratings, based
on their day-to-day experience, this recognition is especially
gratifying.
Over the last few months, we’ve launched new features like
Performance, which helps teams optimize data pipeline
performance and cost, and our Data Product Dashboard, which
enables organizations to manage the data quality of assets
powering critical applications.