DSX Developer Ebook4 FINAL PDF
DSX Developer Ebook4 FINAL PDF
DataStax Enterprise (DSE) was first released in 2011 and is the most
advanced, enterprise-proven distribution of Cassandra available.
Its masterless architecture is designed to handle big data workloads
across multiple nodes with no single point of failure.
2
Cassandra’s peer-to-peer architecture guarantees no
single point of failure to collect and store streaming
data from infrastructure instruments and enables us
to write and read from any node with minimal latency.
Praveen Kumar, Sr. Manager Of Emerging Technologies & Platform, Equinix
Read How
3
Model. Code. Go.
downloads, learning Cassandra and build the next generation of powerful applications.
tools, and much more. We’ll cover the key differences between Cassandra and relational
databases, how to translate your relational database skills to
Cassandra, and how to get started quickly. We’ll also tell you
Sign Up about resources that will help you on your journey, including:
DataStax Documentation
4
Relational and Cassandra.
Compared.
5
We realized that we just couldn’t scale effectively
with a traditional MySQL solution. In fact, we couldn’t
even continue to offer the same level of service to our
customers, simply because the speed of access to our
data would have deteriorated as the amount of data grew.
That’s when we decided to move forward with Cassandra.
Harry Robertson, Tech Lead, Ooyala
Read More
6
Relational and Cassandra. Compared.
7
Relational and Cassandra. Compared.
Key architectural differences between Cassandra and relational databases are as follows:
Relational Cassandra
A list of nested
Table An array of arrays
key-value pairs
8
Relational and Cassandra. Compared.
9
Getting Started
10
Getting Started
DataStax has simplified You can choose to run them on-premises or in any kind of cloud, including managed, public,
private, hybrid, or multi-cloud. The DataStax Distribution of Apache Cassandra provides a
the getting started production-ready version of Apache Cassandra and grants access to the DataStax Bulk,
process for DataStax DataStax Apache Kafka Connector, Production Docker Image, developer tools, and a range
Enterprise and DataStax With DataStax Enterprise, you get even faster performance, as well as greatly simplified
Distribution of Apache database operations and management features with OpsCenter, advanced developer tools
Cassandra). and security features for more operational simplicity, and multi-model development and
mixed workloads with options like DataStax Graph, DataStax Analytics, and DataStax Search.
Want to fast-track learning Apache Cassandra? Take the free self-paced training
Get Started
and master Cassandra’s internal architecture with hands-on exercises.
11
Getting Started
Keyspaces
In Cassandra, a Keyspace is like the schema concept in relational database management
systems. It’s the top-level object and there is only one keyspace per application. Keyspaces
contain tables, materialized views, and user-defined types, functions, and aggregates, and
they control the replication for the objects they contain at each datacenter in the cluster.
12
Getting Started
Tables
The next structural unit is called the table, where data is stored in tables containing
Managing rows of columns. Tables can be created, dropped, and altered at runtime without
tables Learn More
blocking updates and queries. Just like a relational database, Cassandra tables
require a primary key. The difference is the first element in a primary key is called
a partition key. The partition key has a special use in Apache Cassandra beyond
Making showing the uniqueness of the record in the database, it’s other purpose is one that
schema Learn More
is critical in distributed systems: determining data locality.
changes
Schema Changes
In Apache Cassandra, schema changes naturally take time to propagate to all nodes
in the cluster depending on the size, network capacity, and load on the cluster. As a
result,est practices suggest making these changes one at a time and ensuring they
have fully spread throughout the cluster before moving onto subsequent alterations.
13
Introducing CQL
(SQL Skills Welcome)
14
Introducing CQL
If you’re familiar with SELECT, INSERT, UPDATE, and DELETE in SQL, you’re ready
for CQL and can refer to our CQL Quick Reference Guide. We’ve also highlighted
some of the key similarities and differences below:
15
Introducing CQL
Did you know that in Cassandra, data isn’t deleted in the same way it is in an RDBMS?
Learn How
Apache Cassandra is designed for high write throughput and avoids reads-before-writes.
It uses SSTables, which are immutable once written. So, a delete is an update and updates
are actually inserts (into new SSTables). Want to manage data effectively?
16
Introducing CQL
17
Balancing Transactional
Integrity
18
Penn Mutual started out with a traditional RDBMS
approach for the persistence layer of their Core
Service but soon realized that it could not meet
their requirements for application performance or
scalability. They turned to DataStax Enterprise instead.
Learn More
19
Balancing Transactional Integrity
Commit Log SSTable • Atomicity. A write and a delete operation is atomic at the partition
level. Insertions or updates of two or more rows in the same
partition are treated as one write operation.
• Isolation. Write and delete operations are performed with full row-
level isolation, so a row within a single partition on a single node
is only visible to the client performing the operation.
• Durability. Writes are durable. All writes to a replica node are
recorded both in memory and in a commit log on disk before they
are acknowledged as a success.
20
Ways to Migrate
Your Workloads
21
We pulled in one year of data from Oracle and once
we got it into DataStax Enterprise, built on the best
distribution of Apache Cassandra, it was smooth ride
and it was processing at a very high rate.
Mukram Aziz, Sr. Manager of Data Services, Capital One
Read How
22
Ways to Migrate Your Workloads
How do I ingest from an existing There are many ways to migrate data from your relational databases into Apache
relational database (RDBMS)
Cassandra and DataStax Enterprise. Here’s where you can start.
to an Apache Cassandra or
DataStax Enterprise cluster?
Learn How Load and unload CSV or JSON data in and out of the DSE database. DataStax Bulk
DataStax Bulk Loader
Loader efficiently and reliably loads small or large amounts of data, supporting
(dsbulk)
developer and production environments.
Imports data from a comma-separated values (CSV) file or a delimited text file into
CQL COPY FROM
an existing table, mainly for datasets that have less than 2 million rows.
Bulk loads large volumes of external data into a cluster by streaming a set
sstableloader
of SSTable data files to a live cluster.
Several Extract-Transform-Load (ETL) tools like Talend, Informatica, and StreamSets also support Apache
Cassandra and DataStax Enterprise, providing sophisticated data transformation logic, point-and-click
interfaces, scheduling, and more, to manage data movement.
23
Next Steps
24
Next Steps
Take a Learning Path to gain an expert understanding of Apache Cassandra and DataStax Enterprise principles
related to your role. Each Learning Path is composed of a sequence of recommended courses for your role, curated
by our curriculum engineers. When you complete your path, you will receive a printable Certificate of Completion.
You can follow your progress along the path in the chart below and switch to a different path at any time.
25
Conclusion
Moving to Apache Cassandra with DataStax is faster and easier than ever before. That means that you can use the same technology
that’s powering the world’s leading enterprises—like Capital One, Cisco, Comcast, Delta Airlines, eBay, Macy’s, McDonald’s, Safeway,
Sony, and Walmart—and be up and running in no time.
As the world’s leading Cassandra experts, DataStax provides the community with online training, certification, and full documentation;
DataStax Examples provide sample code for developers to reference and shorten the path to getting started. You can also access
DataStax expertise directly through professional services and support options that can ensure the right level development or
deployment help whenever you need it in your journey.
26
About DataStax
DataStax delivers the only active everywhere hybrid cloud database built on Apache Cassandra™: DataStax Enterprise
and DataStax Distribution of Apache Cassandra, a production-certified, 100% open source compatible distribution of
Cassandra with expert support. The foundation for contextual, always-on, real-time, distributed applications at scale,
DataStax makes it easy for enterprises to seamlessly build and deploy modern applications in hybrid cloud. DataStax
also offers DataStax Managed Services, a fully managed, white-glove service with guaranteed uptime, end-to-end
security, and 24x7x365 lights-out management provided by experts at handling enterprise applications at cloud scale.
More than 400 of the world’s leading brands like Capital One, Cisco, Comcast, Delta Airlines, eBay, Macy’s, McDonald’s,
Safeway, Sony, and Walmart use DataStax to build modern applications that can work across any cloud. For more
information, visit www.DataStax.com and follow us on Twitter @DataStax.
© 2019 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache, Apache Cassandra, and Cassandra are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States, and/or other countries.
datastax.com