Survey Streaming Data Future Tech Stack
Survey Streaming Data Future Tech Stack
To better understand how and why data streaming is used in cloud applications and the underlying
software stack, Lightbend has partnered with The New Stack for its second survey on fast data
trends. Eight hundred and four IT professionals provided details about applications that use stream
processing at their organizations. Respondents were primarily from Western countries (41% in Europe
and 37% in North America) and worked at an approximately equal percentage of small, medium and
large organizations.
Architect 25.9%
IT management, including
CIO / CISO / CTO 7%
DevOps 1.7%
Other 1.7%
The report asked in-depth questions of the 74% of respondents who have applications that use
stream processing technology.
2
STREAMING DATA AND THE FUTURE TECH STACK
Table of Contents
Executive Summary....................................................................................................................... 4
Artificial Intelligence and Machine Learning Overtaking Early Adopters’ Use Cases........................ 6
Unflashy app monitoring, log aggregation and ETL top current use cases.......................................................................................6
AI/ML, IoT and ETL adoption rose the most over 2017............................................................................................................................7
AI/ML use cases expected to see the biggest increases.........................................................................................................................8
About Lightbend...........................................................................................................................20
3
STREAMING DATA AND THE FUTURE TECH STACK
Executive Summary
Use cases for data streaming are growing in scope. Application monitoring; log aggregation; and
extract, transform, load (ETL) are still the most common use cases among IT professionals, but
the advent of containers and microservices application architecture allows these workloads to
be packaged and run in a way that provides additional business value beyond the enterprise IT
department. System architects now see container orchestrators, such as Kubernetes, as the real
pathway toward data streaming adoption within their organizations.
Scaled out, distributed architectures are built by teams of developers whose experience dictates
what data streaming technologies to adapt into the services they are building and managing. A data
streaming architecture built for microservices becomes a salient decision. Akka, for example, adds
value by providing fine-grained control of the kinds of processing possible as well as maximizing the
application’s efficiency and reliability. Many stream processing use cases don’t actually require this,
but AI/ML and real-time recommendation engines often do. Further data streaming adoption will
soon follow as their services become more advanced and machine learning and artificial intelligence
become more important to achieve higher business value.
However, barriers are still high for developing and managing application architectures on data streaming
infrastructure. It’s sophisticated technology that requires an understanding of how to keep a long-running
application resilient and able to scale up and down. Developer teams are adapting and trying new
workflows but it can be very risky when the impact on performance is unknown. The right knowledge for
the problem is still the biggest challenge in adoption of nascent data streaming technologies.
Containers are a mechanism to lower the barriers to entry and make data streaming valuable in
at-scale settings. The barriers will diminish further as developers become more comfortable using
data streaming in microservices applications. State management has already become less of an issue
in organizations where microservices adoption is high, because they can leverage existing, proven
design patterns. Today, handling state is complex but manageable if organizations are committed
to microservices architectures. We expect containers and microservices architecture to have a
continued impact on the evolution of data streaming — allowing organizations to take more risks and
realize the results faster than ever before.
Key Finding 01
Artificial Intelligence and Machine Learning Overtaking Early Adopters’ Use Cases
• The use of stream processing for AI/ML applications increased five-fold in two years. Those
already using streaming for AI/ML expect this trend to continue with even broader use in the
coming year.
• Pages (6-9)
4
STREAMING DATA AND THE FUTURE TECH STACK
Key Finding 02
Early Adopters Concerned About Unknowns
• Developer experience, familiarity with tools, and technical complexity are barriers to adoption.
Concern about scalability, latency and other technical challenges increases as the number of
workloads utilizing stream processing rises.
• Pages (9-11)
Key Finding 03
Concern About “State” Lessens as More Applications Use Stream Processing
• Persisting data in a microservices architecture becomes less of an issue as users gain more
experience with containers, microservices and modern databases. Architects see a future
where stream processing and microservices are deployed in the same container-based
infrastructure stack.
• Pages (11-14)
Key Finding 04
Technologists Looking Beyond Kafka for Advanced Use Cases
• While Kafka is sufficient for ETL and messaging, it faces robust vendor competition among
streaming platforms for advanced use cases such as IoT pipelines and recommendation engines.
• Pages (15-18)
5
STREAMING DATA AND THE FUTURE TECH STACK
In which of the followinga applications and use cases does your organization
utilize real-time or stream processing in production environments?
ETL 36.3%
• Companies have embraced real-time processing of data as a way to handle machine generated
data and to more efficiently manage existing data environments.
• Application monitoring and log aggregation are the top use cases because it is essential to detect
problems quickly instead of waiting for offline analysis. Instead of storing all of the raw data
generated by modern applications and systems, the data is often aggregated and stored into time-
series databases that only store metrics that can be easily analyzed.
• Based on a question only asked of the developers, 45% of developers surveyed have experience
working with at least one streaming data application that has been deployed into production or
will be within the next six months.
6
STREAMING DATA AND THE FUTURE TECH STACK
• ETL, data warehousing, and recommendation and decision engines use cases are more than
twice as likely to be deployed at organizations where developers have hands-on experience
incorporating data streams into a production-ready application. ETL and data warehousing are old
problems for which streaming is now being applied. Although recommendation engines are used
less often, developers are more likely to be involved with these types of applications.
AI/ML, IoT and ETL adoption rose the most over 2017
Companies processing data in real time for AI/ML use cases jumped from 6% in 2017 to 33% in 2019 —
a more than five-fold increase.
2019 = use of stream processing in production
2017 = use of “fast data,” which was defined as processing data streams as
they arrive while still supporting batch processing.
2017 2019
14%
ETL 36%
6%
Artificial intelligence / machine learning 33%
13%
Integration of different data streams 32%
13%
Operational insights 16%
17%
5%
8%
6%
Only categories that are phrased exactly the same as in the 2017 survey were included in this time series chart.
• Production-level adoption widened dramatically, with several use cases seeing big jumps over
the last two years. The sharp rise in real-time processing for IoT pipelines, ETL and integration of
different data streams indicates that organizations need to extract insights from their data and
leverage advanced analytics (such as AI/ML) as quickly as possible.
7
STREAMING DATA AND THE FUTURE TECH STACK
• Adoption of stream processing for business operations was relatively stagnant, likely because
operational insights and consolidated views of customers can usually be successfully implemented
with the time lags associated with batch processing.
• Similar stagnation is seen for traditional statistical analysis, which had previously seen wide
consideration among companies that used Hadoop.
ETL 11.4%
Other 1.1%
None 9.9%
• The consensus is that AI/ML use cases will see some of the largest increases in the next year.
• Not only will adoption widen to different use cases, it will also deepen for existing use cases, as
real-time data processing is utilized at a greater scale. Few organizations are outright rejecting
8
STREAMING DATA AND THE FUTURE TECH STACK
current use cases. Instead, they are significantly more likely to say that their current use cases will
expand the most in the next 12 months.
• In addition to AI/ML, enthusiasm among adopters of IoT pipelines is dramatic — 48% of those
already incorporating IoT data say this use case will see some of the biggest near-term growth.
What are the top two challenges your organization faces in processing data immediately?
24.2%
Integration with legacy infrastructure
12%
Cost of technology
Finding and retaining staff with data
engineering, operations or analytical skills 10.8%
Debugging 6.9%
3.7%
Don’t know or N/A
2.3%
Other
Respondents at organizations that don't use stream processing were not asked this question
• Effectively processing data immediately often requires developers to adapt broad changes to
their development and production environments, so it is unsurprising that additional knowledge
is needed.
9
STREAMING DATA AND THE FUTURE TECH STACK
• The second and third most commonly cited technical challenges for stream processing are
choosing and integrating the right tools and techniques.
• Some, but not all, of these challenges can be overcome with experience.
–– The more applications that utilize real-time data, the less often developer knowledge is cited as a
concern. Indeed, only 24% of developers with hands-on experience incorporating streaming data
into production applications say picking the right tools and techniques is a challenge.
–– Integration issues become a greater concern as more applications come online using different tools
and data types.
–– Unsurprisingly, scaling to handle high data volumes was twice as likely to be cited as a challenge by
those that had more than a quarter of their workloads comprised of stream processing.
Only respondents at organizations that use stream processing were asked this question.
• Many respondents were not knowledgeable enough to know to what degree technical challenges
are inhibiting adoption. However, architects believe they know about these issues as they
answered “don’t know” to these questions half as often. Architects were also twice as likely to be
greatly concerned about compute resource requirements.
• The application’s end user is a particular concern when stream processing is utilized in
applications that require non-technical teams to actively use an application. Thus, organizations
that utilize stream processing are more concerned about one type of user (DBAs) when they
have an active data warehouse use case, and another (business analyst) when streaming data is
integrated into dashboards for operational insights.
10
STREAMING DATA AND THE FUTURE TECH STACK
• Concern about scalability and latency increased as the number and types of workloads utilizing
stream processing rose. In particular, latency is twice as likely to be inhibiting stream processing
among organizations that are working on recommendation and decision engines or IoT pipelines.
43%
No need
Developers do not have the knowledge needed
28%
to write robust and performant applications.
Difficulty making changes to existing
24%
solutions and infrastructure
Only respondents at organizations with 0% of applications using stream processing were asked this question
• Two-thirds of respondents believe that handling state is at least partly inhibiting the deployment of
more applications within additional microservices architecture. This is not a permanent barrier though.
• Only 18% of respondents believe state is not at all an obstacle to microservices adoption. However,
among those using streaming in more than half of their applications, 41% say it’s not at all an obstacle.
• Organizations that have adopted microservices are the farthest along with stream processing.
While 58% of respondents are using microservices in production, that figure jumps to 74% among
those with more than a quarter of their applications utilizing stream processing.
• The people who believe state is greatly inhibiting adoption are also those most likely to believe that
increased developer knowledge is a key challenge for processing data immediately. In fact, those that
have solved the “state” problem and say it is “not at all” an inhibitor to microservices deployment are
more than twice as likely to have more than half of their applications utilizing stream processing.
11
STREAMING DATA AND THE FUTURE TECH STACK
• Organizations that have yet to utilize stream processing are particularly concerned that their
developers do not have enough knowledge to write performant applications. Increased education
about methods to handle state may increase adoption.
• Based on additional survey questions, we found that organizations with a high percentage of
applications using stream processing are utilizing more persistent datasets and storage models.
This is consistent with the fact that they are less concerned about persisting state.
45%
Elasticsearch 44%
39%
Cassandra 49%
36%
PostgreSQL 43%
29%
MongoDB 38%
23%
Hadoop Distributed File System (HDFS) 32%
22%
MySQL 21%
22%
Redis 30%
18%
Oracle 11%
15%
SQL Server 18%
12%
Other 14%
9%
MariaDB 14%
5%
DB2 3%
4%
SQLite 6%
12
STREAMING DATA AND THE FUTURE TECH STACK
• Users of modern data stores like Cassandra, MongoDB and Redis are less likely to believe state is
inhibiting adoption of microservices.
• However, some of the most common technologies used with stream processing are deployed by
those who believe handling state is greatly inhibiting microservices adoption. On average, this
group’s production-level adoption of Apache Kafka, Apache Spark Streaming and Elasticsearch
was 36% higher than the sample as a whole.
Which programming languages and If using Java, with which of the following
frameworks do you regularly work with? ways do you handle persistence?
C# 14%
Serialization 20%
Go 14%
Other type of object-
relational mapping (ORM) 19%
Kotlin 13%
C 8%
Akka persistence 15%
Other 6%
• Respondents that utilize serialization often did not know if state is inhibiting microservices
adoption because an internal team often is handling the relevant infrastructure. In fact, 41% of
those utilizing serialization say they use a private, cloud-enabled datacenter with applications that
take advantage of stream processing.
13
STREAMING DATA AND THE FUTURE TECH STACK
In the next 12 months, how likely are you to deploy stream processing
technology in the same “stack” as the following technologies?
Container orchestration
42% 26% 8% 5% 7% 12%
(e.g., Kubernetes)
Function as a Service
20% 25% 14% 10% 14% 17%
(e.g., AWS Lambda)
• Sixty-eight percent of architects believe it is at least somewhat likely that stream processing will be
deployed in the same stack as a container orchestrator like Kubernetes. This does not necessarily
mean that data persistence will be addressed within a container, but rather that some component
of an application will be hosted in a container cluster.
• Utilization of Function as a Service (FaaS) is less likely to be part of the stream processing stack. This
signals that event processing, which is often associated with a serverless architecture, has not become
an essential stream processing use case. When event processing does become more prevalent, we
expect that FaaS will rise in importance as a way to handle compute resource requirements.
14
STREAMING DATA AND THE FUTURE TECH STACK
Technologists Looking
Beyond Kafka for Advanced Use Cases
Kafka use is widespread
Apache Kafka adoption is often used in conjunction with other stream processing technologies.
What is your experience with the following stream processing technologies?
Only respondents at organizations that use stream processing were asked this question.
• The market has embraced Kafka because it is a robust, scalable way to capture streaming data and
serve it between applications.
• Users of Apache Kafka are 60% more likely to have Akka Streams in production. Many respondents
are likely only utilizing Kafka for messaging and not taking advantage of the Kafka Streams library.
• Akka is also being considered when the overhead of big data systems like Spark is high relative to
the amount of data processing being done.
• Recommendation and decision engines is an area where, despite initial adoption, Kafka is falling
short. In fact, while 70% of organizations with recommendation and decision engine use cases are
15
STREAMING DATA AND THE FUTURE TECH STACK
using Kafka, 35% of these organizations are evaluating or piloting Akka Streams. IoT pipelines are
also demanding technologies in addition to the stream storage enabled by Kafka. Apache Flink
gets more attention for this use case as compared to others, as 25% of organizations that use IoT
pipelines are evaluating or piloting Flink.
Other 7% 3 87%
StreamSets 6% 3 89%
Ably 94%
EsperTech 92%
Striim 93%
Only respondents at organizations that use stream processing were asked this question.
16
STREAMING DATA AND THE FUTURE TECH STACK
Private cloud data centers rarely used in conjunction with stream processing
Consistent with its overall market penetration, two-thirds of organizations say an AWS cloud is used
to some extent with production applications that have stream processing components in them.
Which of the following cloud platforms does your organization use at least in part
for production applications that have stream processing components in them?
68%
Amazon AWS
27%
Microsoft Azure
25.2%
Google Cloud
13.2%
Red Hat OpenShift
Heroku 4.4%
Other 4.4%
Only respondents at organizations that use stream processing were asked this question.
• Cloud platforms may be used with applications that include stream processing, but the vendors’ own
stream processing offerings are not always adopted at the same rate by its customers. For example:
–– Google Cloud customers: Of those that have a stream processing application hosted in part with
Google, 45% are using one of Google’s own stream processing offerings versus the 13% of the overall
sample that use it. That’s a 246% increase in comparison.
–– Azure customers: Of those that have a stream processing application hosted in part with Azure, 18%
are using one of Azure’s own stream processing offerings versus the 8% of the overall sample that use
it. That’s a 125% increase in comparison.
–– AWS customers: Of those that have a stream processing application hosted in part with AWS, 37% are
using one of AWS’ own stream processing offerings versus the 26% of the overall sample that use it.
That’s a 42% increase in comparison.
• Azure production usage is particularly strong among those who have IoT use cases and those
expecting integration of multiple data streams to become front-and-center in the next year. Azure
17
STREAMING DATA AND THE FUTURE TECH STACK
customers are also likely to be giving consideration to Lightbend and to Pivotal, both of which
have strong positions among large enterprises.
• Red Hat OpenShift also has strong penetration among large enterprises, with 43% of its users
having more than 10,000 employees. Although they are likely running on separate clusters, Red
Hat’s OpenShift is often used as a platform for applications with stream processing at companies
that also utilize Hadoop in these applications. Twice as many of its respondents (48% versus 24%
for the overall sample) utilize Hadoop as a data store for stream processing.
16.3%
ETL 57.7%
Don’t know
Messaging 57.3%
41.8%
Pub/sub 41%
Yes
41.8% Database 39.4%
No
Replication 23.9%
Other 4.7%
• There is a tipping point for possible disruption as those who say streaming is replacing existing
technologies rises from 42% to 68% among those who use stream processing in more than a
quarter of their applications.
• Organizations that believe stream processing is replacing databases are more likely to use
MySQL and Hadoop as data sources for stream processing. Neither of these technologies were
designed to quickly handle the volume of data involved with streaming data use cases. Since
these are open source data stores, people may believe it is easier to swap them out with another
open source offering.
18
STREAMING DATA AND THE FUTURE TECH STACK
• The parallel trend of migration to container-based infrastructure, e.g., Kubernetes, is also driving
streaming data pipelines to look more like conventional microservices.
19
Lightbend (@Lightbend) is leading the enterprise transformation toward real-time,
cloud-native applications. Lightbend Platform provides scalable, high-performance
microservices frameworks and streaming engines for building data-centric systems
that are optimized to run on cloud-native infrastructure. The most admired brands
around the globe are transforming their businesses with Lightbend, engaging billions
of users every day through softwareAbout
that is Lightbend
changing the world. For more information,
visit www.lightbend.com.
About The New Stack
The New Stack publishes explanation and analysis of at scale, distributed technologies
for developers, DevOps and other IT professionals. The New Stack is a critical and
trusted resource for all people making complex technical decisions. Visit our website
for original articles, podcasts, ebooks and research at https://thenewstack.io.