20221028-EB-Kafka in The Cloud 10x
20221028-EB-Kafka in The Cloud 10x
5 Elasticity
Section 1
Contents
7 Section 2
Storage
9 Resiliency
Section 3
A C C E S S
AUDIOBOOK
2
With cloud becoming the new normal for modern IT infrastructure, people often imagine that bringing open source
software like Apache Kafka® to the cloud is a simple matter of packaging up the software and putting it in
Kubernetes on some public cloud instances. In reality, it’s much harder than that.
imperative for businesses to thrive in 2 Carefully planning and provisioning storage limits,
1
3
today's dynamic, digital-first constantly throttling tenants, and expiring data for clusters LB
to ensure your retained data doesn’t exceed your broker disk
landscape. And Apache Kafka®, capacity
To truly realize the value of the cloud and focus your resources on business growth, In short, you need a cloud service for Kafka that takes
you need a fully managed cloud-native service that abstracts these operational limited data and infrastructure capabilities and transforms
complexities away for you.
them into highly available shared resources that teams can
use as much or as little as needed—whenever they want.
3
Enter Confluent Cloud -
A truly cloud-native service that is 10x better Confluent Cloud is the only truly fully
managed, cloud-native service for
Confluent Cloud allows us to harness the full power of cloud and provide a Kafka service that is Apache Kafka.
substantially better than Kafka alone. In fact, across a number of performance metrics, Confluent
Over the last five years, we’ve poured more than 3 million
Cloud is now 10x better than self-managed open source Kafka or semi-managed services.
engineering hours into building a Kafka service that is:
Confluent Cloud offers Apache Kafka’s protocol and is 100% compatible with the open source
ecosystem. And since it’s purpose-built for the cloud, virtually every layer in the stack has been
transformed, from how data is routed over the network, how requests are processed, how data is Cloud Native
stored, where data is placed and when it is moved to how all of this is controlled and observed at scale. We’ve completely re-architected Kafka for the cloud to be elastically
scalable and globally available—providing a serverless, cost-effective,
and fully managed service ready to deploy, operate, and scale in a
matter of minutes.
Complete
Confluent completes Kafka with 120+ connectors, stream
In
In this
this ebook,
ebook, we’ll
we’ll explore
explore three
three specific
specific areas
areas where
where Confluent
Confluent processing, enterprise security and governance, global resilience, and
Cloud
Cloud has
has re-architected
re-architected Apache
Apache Kafka
Kafka to
to be
be 10x
10x better—
better— more—eliminating the burden and risk of building and maintaining
elasticity,
elasticity, storage,
storage, and
and resiliency.
resiliency.
Along
Along the
the way,
way, we’ll
we’ll discuss
discuss how
how we
we achieved
achieved these
these improvements
improvements
and Everywhere
and the
the benefits
benefits your
your teams
teams stand
stand to
to gain
gain from
from them.
them. Whether in the cloud, across multiple clouds, or on-prem, Confluent
has you covered—plus, you can seamlessly link it all together in real
time to create a consistent data layer across your business.
4
Elasticity Broker 2 Broker 2 Broker 2
available for replication, it could take 43 hours using open source Kafka.
In the example above, each broker would then keep 8.6 TB locally and
To solve the elasticity challenge, Confluent Cloud developed Intelligent 251.6 TB in object storage, and we only need to move 6.5 TB data in
Storage, which uses multiple layers of cloud storage and workload total (¾ of 8.6 TB).
5
Scaling up and down instantly with Confluent Cloud
How 10x elasticity can reduce
Elastic scaling applies when sizing both up and down. O
nce the holiday rush is over, you don’t want that highly
provisioned cluster sticking around costing you money. However, teams working with Apache Kafka have historically
your total cost of ownership
had very limited, time-consuming options for how to do this.
Avoid over-provisioning: With Confluent Cloud, you
can scale your Kafka clusters right before the traffic
Capacity adjustments require a complex process of sizing and provisioning As validated in a recent study by Gigaom, Confluent Cloud completely
manages scaling operations and therefore requires zero time or hits. And with the ability to shrink fast, you avoid
of new clusters, networking setup, traffic balancing across new brokers
interaction to manage the size of the infrastructure. By contrast, overpaying for any excess capacity when traffic
and partitions, and much more. Too often, the manual effort isn’t worth
the savings of running a smaller cluster.
Apache Kafka requires considerable time and effort (47 story points) slows down
for networking, load balancing, etc.
With Confluent Cloud, you can shrink clusters just as fast as you can , v v :
Enable a faster less expensi e ser ice Because
expand them. Our Basic and Standard offerings allow instant auto-scaling To learn more about what efforts it would have taken you to resize a Confluent Cloud takes advantage of a blend of
and can scale down to zero with no pre-configuration. With our Dedicated cluster with self-managed Kafka, check out a live discussion and demo.
object storage and faster disks in an intelligent
clusters, you can scale down clusters by just moving the CKU slider.
manner, you get a lower latency service, at a
Read the blog post to learn more about how we made Confluent Cloud
The best of all of these is that the 10x faster scaling happens behind the 10x more elastic.
’
competitive price, that s billed only when you use it
curtain of our fully managed offering, making scaling 10x easier for you.
v :
Remo e cycles spent on infrastructure Since
scaling is 10x easier without operational burdens,
you can reallocate your engineering resources to
build something that differentiates your business.
AK capacity planning
Actual throughput
make changes alleviates work on our end and
makes it easy. ”
Lucas Recknagel
With faster scaling up and down, businesses save on TCO by avoiding wasted capacity from over-provisioning.
6
Storage Powerful use cases are enabled when Kafka storage limit is lifted:
System of record: Kafka is often the first place data lands across systems.
To illustrate why the separation of storage and compute
matters to resource utilization, imagine a use case that
requires us to produce at a sustained rate of 150 MB/s to a
Never worry about Kafka retention With no retention limits, Infinite Storage helps establish Confluent Cloud as
a source of truth in an organization, providing a persistent and authoritative
topic with the default 7-day retention. This means that by
day seven, you would have about 272 TB of data. In the
limits again with Infinite Storage view of data that any user or system can immediately access.
image below, the area shaded in blue represents the
Real-time application and analysis with historical data: With Confluent amount of throughput actually required to satisfy these
Keeping real-time and historical data in Apache Kafka allows for Cloud as a system of record, businesses have access to real-time and requirements, while the area in red represents the extra
more advanced use cases, faster decision making, and better historical events from any time period. This powers all sorts of use cases— unused compute resources you’ll have to provision to
compliance with data retention requirements. However, there is a from data reprocessing with all historic data to rebuilding history from satisfy storage requirements when operating in open
practical limit to how much you can store on a single Apache various systems to machine learning (ML). Check out a live ML demo source Kafka, since you can only attach limited amount of
Kafka broker.
leveraging Infinite Storage.
storage to a single broker.
Because storage and compute are tied together, you have to Meeting data retention compliance requirements: For example, financial With Confluent Cloud, because storage and compute are
provision additional brokers and pay for more resources when you institutions are required to retain data for seven years. During audits, separated, storage can automatically scale as you need it
companies usually create a new application just to surface data from this without limits on retention time. It provides a proper cloud
hit that limit. This makes retaining the right amount of data in consumption model that allows users to store as much
Kafka operationally complex and expensive—with operators time period. It’s infinitely simpler to read this data from an existing Kafka
log than having to reload data from various data sources.
data as they want, for as long as they need, while only
having to constantly throttle tenants to monitor storage limits, paying for the storage used. As a result, you’ll be able to
expire data for clusters that reach capacity, and negotiate scale your storage infinitely better than Apache Kafka.
retention times with application teams to maintain cluster
uptime and cut costs.
(MB/s)
Unused throughput
store a subset of “hot” data locally on the broker while offloading 750 required with Apache Kafka
250
Compute (Confluent Cloud)
0
50,000 100,000 150,000 200,000 250,000 3 00,000
S torage (GB)
Growth of compute resources as storage scales (Apache Kafka vs. Confluent Cloud).
7
10x more performant storage with Confluent Cloud
How 10x storage can reduce your
Performance is another place where we made storage 10x better. In Apache Kafka, mixing consumers
that read both real-time (latest) and historical (earliest) data can cause a lot of strain on the I/O system, total cost of ownership
slowing down throughput and increasing latency.
Pay only for the retained data: Confluent Cloud is pay-as-you-go, so
One of the great performance wins with Infinite Storage is the resource isolation that happens between these reads. Since “historical” you are only billed for actual storage used, not for any pre-
consumers read from object storage, they rely on a network that consumes a separate resource pool than real-time consumers. With provisioned or unused capacity. Also, with compute and storage
this adjustment, the large batch reads that you typically see with historical workloads will not compete with the streaming nature of separated, you no longer have to waste unnecessary compute
real-time workloads, preventing latency spikes and improving throughput.
infrastructure for storage-bound use cases.
Read our blog post to learn more about how we built Kafka storage that’s 10x more scalable. Reduce operational burdens: Infinite storage by itself removes all the
operational complexities of dedicatedly planning and retaining the
right amount of data in Kafka. What’s more, Confluent Cloud can
auto-scale storage based on retention policies and traffic, further
relieving your operations resources to more value-adding activities
Real-Time Historical Avoid downtime, data loss or a possible breach in data retention
Consumers Consumers
compliance: With infinite storage, you never have to worry about
revenue loss or audit fines due to storage downtime or data loss
caused by capacity limit or data expiration.
Broker Broker Broker
Olivier Richaud
Resource isolation between real-time and historic data consumption leads to 10x storage performance.
8
Kafka is designed with high availability and durability through replications. However, this design is insufficient for a highly reliable data
Resiliency streaming service in the cloud—and doesn’t take away all the operational burdens and risks. This is especially true when complexities
multiply as Kafka spans across more use cases, apps, data systems, teams, and environments. Issues that can arise include:
Limited downtime protections for Kafka software failures, zone failures, or cloud provider outages
Leave Kafka reliability
Lack of durability monitoring to help detect, prevent, and mitigate data integrity issues, either in real-time or in batc
worries behind with 10x High operational complexities for availability configuration, resiliency policy design, disaster recovery and failover deployment, manual upgrades and patching, etc.
durability
Confluent Cloud’s SLA covers not only infrastructure but also Apache
Client
x
Kafka performance, critical bug fi es, security updates, and more —
A cloud product is only as useful as it is resilient. As
something no other hosted Kafka service can claim.
infrastructure NETWORKING
businesses mature in their Apache Kafka adoption FAILURES K8S FAILURES
damage, fines or audits, reduced customer Cloud makes sure that these three copies are distributed across
satisfaction scores, or critical data loss. The result? three different availability zones. This ensures that two copies are
Downtime or failures can happen in multiple areas in a typical Kafka deployment. cloud services (storage, compute, etc.) and mitigates its impact by
from any corruption or loss.
/
appropriately isolating the impacted node replica. t also auto- I
/
rebalances the cluster to add remove or uneven workload scenarios.
Let’s take an OSS Kafka deployment in the cloud (that follows all best
deployment that’s 100x less downtime, and it’s why we say that Confluent
cloud offers beyond 10x better availability than Apache Kafka. And the best
8.45% ~
~
3
5% =
3 3
99. x 99.99% x 99.99% 9 99% part ? This is all built in to the product and doesn’t require additional FTEs to
manually maintain a strong SLA on your own.
for a public cloud provider, we introduced Cluster Linking and Schema Linking
uptime availability SLA for our customers. This is one of the industry’s
1 SLA for AWS components ma: x 99.5% for individual EC2 instances,
highest and most comprehensive SLAs.
99.99% for EBS volumes, 99.99% for elastic load balancers
9
10x durability through automatic auditing services
How 10x resiliency can reduce your
Durability is the other side of resiliency, and is a measure of data integrity. Apache Kafka primarily guarantees total cost of ownership
high durability through redundancy. We’ve further built robust tooling and durability auditing services to
Kafka Broker
Dashboards and Alerts
Offload Kafka maintenance: Building and maintaining a high
keep-the-lights-on activities.
Broker-Local Storage
Object Storage
Read our blog post to learn more about how you can leave Kafka
With Kafka at its core, Confluent offers a truly cloud-native service "With Confluent Cloud, we no longer have to chase downtime
Take Confluent
that enables your business to set its data in motion while avoiding the brokers, spin them back up and see how to recover them. We
Cloud for a test headaches of low-level data infrastructure management. It’s the only see much less unbalanced partitions and don't have to think
cloud Kafka service with enterprise-grade features, security, and zero about all kinds of edge cases with the brokers that we used to
spin and never handle in the past. So I have to say that the resiliency of our
ops burden for all of your data streaming needs—available everywhere
look back.
An easier, safer, and more cost-effective way to stream real-time data
Natan Silnitsky
is waiting.
GET STARTED
Get started with Confluent Cloud for free.
10