100% found this document useful (1 vote)
349 views49 pages

Azure Cosmos DB Developer Ebook - FINAL

Uploaded by

Daniel Silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
349 views49 pages

Azure Cosmos DB Developer Ebook - FINAL

Uploaded by

Daniel Silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

E-book Series

Developer’s Guide to
Getting Started with
Azure Cosmos DB
Who should read this?
Azure Cosmos DB, This eBook was written for developers who

the globally are thinking of building new cloud apps or


moving existing NoSQL apps to the cloud. It
provides a brief primer on NoSQL, followed
distributed by an overview of Azure Cosmos DB and the
value it brings to developers building apps

database service for NoSQL workloads. We also provide an


introduction to the core concepts you’ll need

from Microsoft
to understand how best to put Azure Cosmos
DB to use, followed by some resources to
help you get started.

Why Azure Cosmos DB?


Azure Cosmos DB, the globally distributed
database service from Microsoft, is unique in
many ways. It is ideal for distributed apps
that require extremely low latency at a global
scale and enables you to avoid the all-or-
nothing tradeoffs you face with most other
NoSQL (nonrelational) databases by
providing:
• native support for all major NoSQL data
models—including key-value, document,
graph, and columnar
• turnkey global distribution,
• multi-master support,
• elastic scaling of throughput and storage

• five well-defined consistency levels,


• data indexing as data is ingested, without
requiring you to deal with schema or
index management
it does all this with guaranteed high
availability and low latency, all backed by
industry-leading SLAs.
Contents
NoSQL: A quick primer .................................... 1 Online backup and restore .............................. 31
NoSQL defined ........................................................ 1 Compliance ............................................................ 32
When to consider NoSQL.................................... 2 Building an app with Azure Cosmos DB ....... 33
How to choose a NoSQL database ................. 2 Choosing the right API ...................................... 33
Azure Cosmos DB: A globally distributed, One database, multiple APIs ........................... 33
multi-model database ...................................... 5
Choosing an API................................................... 34
Key features and capabilities ............................. 6
Getting Started with the SQL API .................. 35
Common use cases ................................................ 8
Quickstarts .............................................................. 35
Core concepts and considerations ................ 10
Tutorials ................................................................... 35
Azure Cosmos DB accounts ............................ 10
How-to guides ...................................................... 36
Resource model: Databases, containers, and
Additional resources........................................... 36
items.......................................................................... 11
Getting Started with the Cassandra API ....... 37
Partitioning and horizontal scalability ........ 12
Quickstarts .............................................................. 37
Choosing a good partition key ...................... 13
Tutorials ................................................................... 37
Request units and provisioned throughout
..................................................................................... 14 Cassandra and Spark .......................................... 38

Global distribution .............................................. 18 How-to guides ...................................................... 38

Consistency ............................................................ 21 Getting Started with the Azure Cosmos DB for


MongoDB API ................................................. 39
Automatic indexing............................................. 24
Quickstarts .............................................................. 39
Architectural considerations ......................... 27
Tutorials ................................................................... 40
Change feed........................................................... 27
How-to guides ...................................................... 40
Building serverless apps with Azure Cosmos
Getting Started with the Gremlin API ........... 42
DB and Azure Functions.................................... 28
Quickstarts .............................................................. 42
Server-side programming ................................ 29
Tutorials ................................................................... 43
Apache Spark to Azure Cosmos DB
Connector ............................................................... 30 How-to guides ...................................................... 43
Built-in operational analytics with Apache Getting Started with the Table API ............... 44
Spark (in preview) ................................................ 30 Quickstarts .............................................................. 44
Operational considerations ........................... 31 Tutorials ................................................................... 44
Cost optimization with Azure Cosmos DB 31 How-to guides ...................................................... 45
Security .................................................................... 31 Conclusion ....................................................... 46

© 2019 Microsoft Corporation. All rights reserved. This document is provided “as is.” Information and views expressed in this document,
including URL and other internet website references, may change without notice. You bear the risk of using it. This document does not
provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your
internal, reference purposes.
Common NoSQL data models
NoSQL: A quick The most common types of NoSQL data
models include:
primer • Key-value type, which pairs keys and
values using a hash table—in a manner
If you work with databases, you’ve probably similar to how a file path points to a file
heard of NoSQL. Even if you haven’t, odds are containing some data. The key is used to
that you depend on NoSQL databases more reference the value, which can include any
than you know—if not as a developer, as an arbitrary value—for example, an integer,
end user. It’s becoming more and more string, a JSON structure (aka a document),
popular with today’s largest companies for its a JPEG, an array, and so on.
flexibility and scalability, in areas ranging from
• Document databases extend the concept
gaming and e-commerce to big data and real-
of the key-value database by organizing
time web apps. The use cases for NoSQL are
entire documents into groups often called
continuing to grow, and, with the availability
collections. A key can be any attribute
of NoSQL database services in the cloud, the
within the document, within which data is
benefits that it provides are within the reach
encoded using a standardized format,
of all.
such as XML or JSON. (In general, key-
NoSQL databases have been around since the value stores don’t support nested key-
1960s, under various names. However, their value pairs, whereas document databases
popularity began to surge—and the NoSQL do. What’s more, because document
label was attached—much more recently, as databases store their data in a format that
leading technology companies began the database can understand, they allow
adopting NoSQL databases for their ability to queries on any attribute within a
handle petabytes of rapidly changing, document.)
unstructured data. But what exactly is a
• Columnar, or wide-column databases,
NoSQL database and, more importantly, what
which generally store the values of one or
can it do for you as a developer?
more columns together in a storage block.
Unlike relational databases, a columnar
NoSQL defined database can efficiently store and query
NoSQL is the name for a category of across rows that contain sparsely filled
databases that are nonrelational in nature, columns.
meaning that data storage and retrieval aren’t
• Graph, which uses a data model based on
handled using a predefined schema, with
nodes, edges, and properties to represent
structured rows and columns, as with a
interconnected data—such as
relational database. Instead, NoSQL databases
relationships between people in a social
don’t require a predefined schema and
network.
employ data models that make them highly
effective at handling unstructured, It’s worth noting that most NoSQL databases
unpredictable data—often with blazing-fast can also handle highly structured data—they
query speeds. By design, most NoSQL just aren’t limited to it, nor do you need to
databases also support horizontal scalability. define a database schema ahead of time.
Similarly, if you want to add new data types to
a NoSQL database, unlike with a relational
database, you won’t need to stop what you’re

1
doing, add new columns, and then move your app needs to do, and what’s required of your
data to the new schema. This can be a big database to support that.
advantage when it comes to agile
If you need to handle unstructured data at
development and more frequent software
any scale, NoSQL might be a good place to
release cycles.
start. Now consider the other characteristics of
many NoSQL databases, such as low latency,
Horizontal scalability
horizontal scalability, and automatic
Another factor that contributes to the rapid replication. Clearly, these characteristics lend
adoption of NoSQL databases is that they’re themselves well to a distributed app that
designed to scale out, or scale horizontally, requires fast performance across multiple
which makes them capable of handling a geographic regions—achieved by using the
virtually unlimited amount of data. That’s not enabling characteristics of NoSQL to put a
to say that you can’t scale out a relational copy of your data in each geography where
database, but it can get tricky. Many NoSQL your users reside. Similarly, the low latency of
databases, in comparison, have the inherent NoSQL makes it a strong candidate for
capabilities that allow them to scale out delivering real-time customer experiences—
automatically and distribute their data over an like you might need for e-commerce or
arbitrary number of servers. gaming. NoSQL is also proving popular in
other scenarios, such as building serverless
Replication apps and implementing big data/analytics
Most NoSQL databases are distributed and over operational data/transactional apps.
support some form of automatic replication,
The takeaway here is that, at the end of the
which can help maintain service availability in
day, the decision on when to use NoSQL is
the event of a planned or unplanned outage.
about more than just whether your data is
Replication also lets you distribute copies of
structured or not—it’s about what your app
your data across multiple geographies. For
needs to do, and how easily and flexibly you
geographically distributed apps, it implies that
can achieve that.
someone using an app in one part of the
world can read from a local replica and rather
than waiting for data to be retrieved from the
How to choose a NoSQL
other side of the globe. database
Flexibility in handling unstructured data,
When to consider NoSQL inherent horizontal scalability, and built-in
At this point you might be asking, “So when replication are all reasons why NoSQL
should I use a NoSQL database?” To answer databases are becoming more and more
this, it’s worth starting with an popular. And with so many of them to choose
acknowledgment that “NoSQL” can also mean from, developers can usually find one that’s
“Not only SQL.” As we’ve stated, NoSQL well suited to their data. Such specialized,
databases can also handle structured data— purpose-built NoSQL databases can also serve
and can often be accessed using a structured queries with blazing speed in many cases,
query language like SQL. Although at first that which is critical in delivering real-time user
might seem to muddy the picture when it experiences at scale—gaming and e-
comes to the SQL or NoSQL question, it really commerce are two good examples. However,
doesn’t—it just shifts the focus to what your that’s not to say that there aren’t some
potential tradeoffs and other important

2
considerations associated with choosing a The impossibility result of the CAP theorem
NoSQL database. proves that it’s impossible for such a system
to both remain highly available and deliver
Programming models and APIs linearizable consistency in the event of a
If you’ve worked with relational databases, network failure (in which replicas are unable
you’re probably aware that they’re not always to talk to each other). Similarly, the CAP
a good match for the data structures you use theorem shows that, in the absence of a
when programming. Many NoSQL databases, network failure, you can achieve both
however, are aggregate oriented, with an availability and consistency. However, even in
aggregate defined as a collection of data that the absence of a network failure, you still need
you interact with as a unit—making them a to consider tradeoffs between consistency and
much more natural fit for modern object- latency—formally codified in PACELC
oriented programming languages. theorem—due to the fact that data packets
being sent over a network wire are unable to
As such, when it comes to choosing a NoSQL
travel faster than the speed of light.
database, you’ll probably want to start by
choosing a data model—and then evaluate Some NoSQL databases don’t guarantee
the NoSQL databases that support it, along consistency. Most of them, however, let you
with the programming languages and SDKs choose from either end of the spectrum:
that each database supports. Does the strong consistency (you’ll get the latest data,
database lock you into a given SDK and but you might need to wait) or eventual
language, or will you have a choice in the consistency (you’ll get a fast response, but the
matter? And does the SDK have what you data might be stale). Some NoSQL databases
need to get the most out of your distributed support other consistency levels, which
database—such as transparent multihoming typically fall in between those extremes. The
APIs to ensure that your app can properly key takeaway here is that, all other things
operate in case of a planned or unplanned considered equal, the more flexibility and
failover? control you have in terms of consistency
levels—and thus the tradeoffs between
Consistency vs. latency consistency and latency—the better off you’ll
Because a replicated NoSQL database is, in be.
effect, a distributed system, you’ll need to be
aware of the CAP theorem. Also called
On-premises vs. cloud—and which
Brewer’s theorem, it states that it’s impossible cloud?
for a distributed data store to simultaneously NoSQL databases have been around for years,
provide more than two out of the following so you can find many that were designed to
three guarantees: run on-premises. However, it’s worth noting
that NoSQL databases really started becoming
• Consistency—Ensuring that every request
popular with the advent of the cloud—and for
receives the most recent data
good reason: their distributed nature and
• Availability—Ensuring that every request horizontal scalability make them an ideal fit. In
receives a response fact, odds are that, regardless of the data
• Partition tolerance—Ensuring that the model you choose, you’ll find several cloud
system continues to operate in the event options. But as you’re probably aware, all
of a failure between network nodes clouds are not created equal. So how do you
choose?

3
In approaching this decision, in addition to availability built into the service, or will it
programming languages/APIs and be an added complication that I’ll need to
consistency/latency tradeoffs, you might want worry about?
to consider the following:
• Service levels. Does the cloud service
• Supported data models. Does the cloud guarantee a certain level of availability?
provider support all the data models that I Does it have any latency guarantees? And
might want to use? And if so, will I need to if so, are they “empty promises” or are
juggle a bunch of different database they financially backed?
services?
• Ecosystem. How tightly integrated is the
• Deployment and operations. How easily database with the rest of the cloud
can I deploy my database, and then platform? Does it provide all the services I
replicate it to other regions if needed? need, and can they be quickly stitched
How tedious are the setup and together to build a complete solution?
maintenance requirements? Do I get a
Finally, in selecting a NoSQL database service,
fully managed service, or will I need to
it’s worth taking a step back and examining its
worry about patching and planned
cloud platform as a whole. Rarely does any
downtime?
database exist in isolation, so you’ll want to
• Geographic presence. Where are the make sure the service you choose—and the
cloud provider’s datacenters? Can put my platform upon which it resides—can provide
data where I want it? How will I handle everything that you’ll need to put your NoSQL
important regulatory and data sovereignty database to use. The specific services you’ll
issues, such as the European Union’s new need will depend on your app, such as the
General Data Protection Regulation? ability to integrate your NoSQL database with
other app components via serverless
• Ease of replication. What’s the process
functions. Other cloud services that you might
for replicating my database to a different
need are more scenario specific, such as those
geographic region? How complex is the
for ingesting massive volumes of IoT data,
process, and how long will it take?
implementing real-time streaming analytics,
• Scalability. How will I ensure the or building AI into your apps. And don’t forget
database resources required to ensure about ease of integration, such as triggering a
adequate performance—and scale for serverless function when your NoSQL data
growth? Will I need to pre-provision and changes. After all, even if a cloud platform
pay for resources that I might never use, provides all the services you need, you don’t
or can I scale up and down on demand to want to tie them together with paper clips and
handle unpredictable workloads? glue.
• High availability. What will happen in the
event of an unexpected failure? Is high

4
storage, making it ideal for apps that require
Azure Cosmos DB: A extremely low latency, anywhere in the world.
With Azure Cosmos DB, you get things that
globally distributed, you can’t find anywhere else. It’s the only
database service that offers five well-defined
multi-model consistency levels, enabling you to avoid the
all-or-nothing tradeoffs you face with most

database NoSQL databases. It even indexes your data


for you as it’s ingested, without requiring you
to deal with schema or index management.
Azure Cosmos DB, the globally distributed
And it delivers guaranteed high availability
database service from Microsoft, is a lot more
and low latency, all backed by industry-
than just another NoSQL database. it provides
leading service-level agreements (SLAs).
native support for all major NoSQL data
models - key-value type, document, graph, Best of all, because Azure Cosmos DB is a fully
and columnar—exposed through multiple managed Microsoft Azure service, you won’t
APIs so you can use familiar tools and need to manage virtual machines, deploy and
frameworks. Azure Cosmos DB also delivers configure software, or deal with upgrades.
turnkey global distribution, multi-master Every database is automatically backed up,
support, and elastic scaling of throughput and protected against regional failures, and
encrypted, so you won’t have to worry about
those things either—leaving you with even
Check out our technical more time to focus on your app.
training series A brief history of Azure Cosmos DB
This seven-part webinar series covers
As a cloud service, Azure Cosmos DB is built
the following topics:
from the ground up for multitenancy, elastic
• Technical overview of Azure scalability, high availability, and global
Cosmos DB distribution—with low latencies and intuitive,
predictable consistency levels. The work
• Build real-time personalized
began in 2010, when developers at Microsoft
experiences with AI and serverless
set out to build a database that could meet
technology
those fundamental requirements for internal
• Using the Gremlin and Table APIs global apps. The result was a new fully
with Azure Cosmos DB managed nonrelational database service
• Build or migrate your Mongo DB called Azure DocumentDB.
app to Azure Cosmos DB Seven years later, we announced Azure
• Understanding operations of Azure Cosmos DB, the first globally distributed,
Cosmos DB multi-model database service for building
planet-scale apps. Since then, we’ve added
• Build serverless apps with Azure support for new APIs, a native Apache Spark
Cosmos DB and Azure Functions connector, the Azure Cosmos DB Change Feed
• Apply real-time analytics with Processor Library (which provides a sorted list
Azure Cosmos DB and Spark of documents in the order in which they were
modified), support for Azure Cosmos DB in

5
the Azure Storage Explorer, and a number of place. These data models are supported
features for monitoring and troubleshooting. through the following APIs, with SDKs
available in multiple languages:
In January 2018, Info World’s 2018 Technology
of the Year awards recognized Azure Cosmos • SQL API: An API for accessing the core
DB, zeroing in on its “innovative approach to schema-less JSON document-oriented
the complexities of building and managing database engine with rich SQL querying
distributed systems.” capabilities.
So just how did we achieve this? By design, • Azure Cosmos DB API for MongoDB: An
Azure Cosmos DB does three things very well: API for accessing the document- oriented
massively scalable MongoDB-as-a-service
• Partitioning, which is what enables elastic
that you can use to easily move existing
scale out of storage and throughput.
MongoDB apps to the cloud. The
• Replication, which enables turnkey global MongoDB API enables connectivity
distribution—augmented with a set of between Azure Cosmos DB and existing
well-defined consistency levels to let you MongoDB libraries, drivers, tools, and
tune consistency versus performance. apps.
• Resource governance, through which • Cassandra API: An API for accessing the
Azure Cosmos DB can offer column based globally distributed
comprehensive SLAs encompassing the Cassandra-as-a-service, which makes it
four dimensions of global distribution that easy to move existing Apache Cassandra
customers care about the most: apps to the cloud. The Cassandra API
throughput, latency at the ninety-ninth enables connectivity between Azure
percentile, availability, and consistency. Cosmos DB and existing Cassandra
libraries, drivers, tools, and apps.
Key features and capabilities
• Gremlin (graph) API: An API to the fully
To understand how you can use Azure managed, horizontally scalable database
Cosmos DB to build infinitely scalable, highly service that supports Open Graph APIs
responsive global apps, it’s worth looking at (based on the Apache TinkerPop
its key capabilities in more detail. Later in this specification).
e-book, we’ll take a deeper dive into many of
these same concepts. • Azure Table API: An API built to provide
automatic indexing, guaranteed low
Multiple data models. Azure Cosmos DB is latency, global distribution, and other
the only fully managed service that natively features of Azure Cosmos DB to existing
supports document, graph, key-value, and Azure Table storage apps with very
columnar NoSQL data models—all in one minimal effort.

Figure 1. Azure Cosmos DB natively supports document, graph, key value, and columnar data models.

6
Turnkey global distribution. Azure Cosmos intuitive consistency levels—ranging from
DB is the only database service that delivers strong to eventual. In between those two
turnkey global distribution. It lets extremes, you get three intermediate
you distribute your data to any number consistency levels to choose from (bounded
of Azure regions with just a few mouse clicks, staleness, consistent-prefix, and session),
keeping your data close to your users to enabling you to fine-tune the tradeoffs
maximize app performance. With the Azure between consistency and latency for your app.
Cosmos DB multihoming APIs, your app
No schema or index management. Azure
always knows where the nearest copy of your
Cosmos DB lets you rapidly iterate without
data resides, without any configuration
worrying about schemas or indexes. The Azure
changes, even as you add and remove
Cosmos DB database engine is schema
regions.
agnostic, and Azure Cosmos DB is the only
Multi-master support. With multi-master database service that automatically indexes all
support (multi-region writes), you can write the data it ingests, resulting in blazing-fast
data to any region associated with your Azure queries. It works across all supported data
Cosmos DB account and have those updates models, without the need for schemas or
propagate asynchronously, enabling you to secondary indexes.
seamlessly scale both write and read
Global presence. As a foundational Azure
throughput anywhere around the world. You’ll
service, Azure Cosmos DB is available in all
get single-digit millisecond write latencies at
regions where Azure is available— currently
the ninety-ninth percentile, 99.999 percent
54 regions worldwide.
write (and read) availability, and
comprehensive and flexible built-in conflict Industry-leading security and compliance.
resolution. Multi-master support is crucial for When you choose Azure Cosmos DB, you run
building globally distributed apps and on Microsoft Azure—the world’s most trusted
significantly simplifies their development. cloud, with more compliance offerings than
any other cloud provider. Data within Azure
Limitless, elastic scale out of storage and
Cosmos DB is always encrypted, both at rest
throughput. With Azure Cosmos DB, you pay
and in motion, as are indexes, backups, and
only for the storage and throughput that you
attachments. Encryption is enabled by default,
need—and can independently and elastically
in a manner that’s transparent to your app
scale storage and throughput at any time,
and has no impact on performance,
across the globe.
throughput, or availability.
Guaranteed low latency. With its latch-free,
“Always on” availability. Azure Cosmos DB
write-optimized database engine, Azure
provides a 99.99 percent availability SLA for all
Cosmos DB delivers guaranteed low latency.
single-region accounts and a 99.999 percent
For a typical 1-KB item, reads are guaranteed
read availability SLA for all multi-region
to be under 10 milliseconds at the ninety-
accounts. Automatic failover helps protect
ninth percentile; indexed writes are
against the unlikely event of a regional
guaranteed to be under 10 milliseconds at the
outage, with all SLAs maintained. You can
ninety-ninth percentile, within the same Azure
prioritize failover order for mult-iregion
region. Median latencies are even lower, at
accounts and can manually trigger failover to
under 5 milliseconds.
test the end-to-end availability of your app—
Five well-defined consistency options. with guaranteed zero data-loss.
Azure Cosmos DB is the only database service
that offers five well-defined, practical, and

7
Unmatched, enterprise-grade SLAs. With industry-leading, financially-backed SLAs for
Azure Cosmos DB, you can rest assured that 99.999 percent high availability, latency at the
your apps are running on an enterprise-grade ninety-ninth percentile, guaranteed
database service. In fact, Azure Cosmos DB is throughput, and consistency.
the first and only database service to offer

Figure 2. Azure Cosmos DB offers industry-leading, financially backed SLAs

Common use cases • Real-time customer experiences. The


guaranteed low latency provided by Azure
Now that we’ve covered the key features and
Cosmos DB makes it ideal for delivering
capabilities of Azure Cosmos DB, just how can
real-time customer experiences and other
you put them to use? As a fully managed,
latency-sensitive apps. And when you use
multi-model database service, Azure Cosmos
Azure Cosmos DB together with Azure
DB is a good choice for a broad range of apps.
Databricks for its advanced analytics and
It’s especially well-suited for event-driven
machine learning capabilities, you can
serverless apps that require low latency and
build apps that provide personalization
that might need to scale rapidly and globally.
and real-time recommendations.
Add in its support for multiple data models
and APIs and five consistency levels, and you • Internet of things (IoT). Azure Cosmos
have a NoSQL-compatible database service DB lets you accommodate diverse and
capable of supporting most any scenario unpredictable IoT workloads—enabling
where a traditional relational database isn’t a you to scale instantly and elastically to
good fit. handle sustained, write-heavy data
ingestion, all with uncompromised query
That said, here are common scenarios where
performance.
Microsoft customers are using Azure Cosmos
DB: • E-commerce. Azure Cosmos DB supports
flexible schemas and hierarchical data,
• Globally distributed apps. Azure Cosmos
making it well suited for storing product
DB lets you build modern apps at a global
catalog data where different products
scale, ensuring uncompromised
have different attributes. This is one of the
performance no matter where your users
reasons why Azure Cosmos DB is used
are. You can easily put copies of your data
extensively in Microsoft’s own e-
in regions across the world, knowing you’ll
commerce platforms.
get guaranteed low latencies and built-in
failover to ensure high availability and • Gaming. Modern games rely on the cloud
disaster recovery. to deliver personalized content like in-

8
game stats, social media integration, and Cosmos DB to improve the efficiency of
leaderboards. Through its low-latency Spark jobs.
reads and writes, Azure Cosmos DB can
• Migration of existing NoSQL workloads
help deliver an engaging, uncompromised
to the cloud. Azure Cosmos DB makes it
in-game experience across large and
easy to migrate existing NoSQL workloads
changing user bases. At the same time, its
to the cloud—in many cases, with no more
instant, elastic scalability enables it to
than a change to a connection string in
easily support the traffic spikes that are
your app. With the Azure Cosmos DB
likely to occur during new game launches,
Mongo DB and Cassandra APIs, you can
online tournaments, and feature updates.
migrate on-premises MongoDB and
• Serverless apps. Azure Cosmos DB Cassandra databases to Azure Cosmos DB,
integrates natively with Azure Functions, respectively, then continue to use your
making it easy to build event-driven, existing tools, drivers, libraries, and SDKs.
serverless apps that let you seamlessly You won’t need to spend any more time
scale data ingestion, throughput, and data managing an on-premises database and
volumes. Your data will be made available will benefit from all that Azure Cosmos DB
immediately and indexed automatically, brings to the table. The videos on the
with stable ingestion rates and query Azure Cosmos DB YouTube channel can
performance. And with the change feed help you get started.
support in Azure Cosmos DB, you can
On the following pages, we take a deeper look
easily use changes in your data to kick off
at these and other key capabilities of Azure
other actions and/or synchronize multiple
Cosmos DB, including how they work and how
data models in your event-driven app.
to put them to use. We’re confident that, by
• Big data and analytics. Azure Cosmos DB the time you finish reading, you’ll be ready to
integrates effortlessly with Azure choose an API and go hands-on with Azure
Databricks for advanced analytics via Cosmos DB. Or, if you prefer to learn by
Apache Spark, enabling you to implement doing, you can skip forward to Choosing a
machine learning at scale across fast- data model and API, get started with your
changing, high-volume, globally chosen API, and refer to the Key Concepts
distributed data. The Spark to Azure section of this e-book on an as-needed basis.
Cosmos DB connector lets Azure Cosmos
If you’d prefer to watch a video, many of these
DB act as an input source or output sink
same concepts are also covered in the first
for Spark jobs and can even push down
webinar in the Azure Cosmos DB Technical
predicate filtering to indexes within Azure
Training series.

9
is identified by a unique DNS name and
Core concepts and supports a single Azure Cosmos DB API.
Manage an Azure Cosmos DB account
considerations provides detailed instructions on how to
create an account, add or remove regions,
configure multiple write regions, enable
Azure Cosmos DB accounts
automatic failover, set failover priorities, and
To begin using Azure Cosmos DB, you’ll need perform a manual failover. Most Azure
to create an Azure Cosmos DB account under
Cosmos DB account management tasks can
your Azure subscription. If you don’t have an be performed by using the Azure portal, Azure
Azure subscription, you can sign up for a free
CLI, or Azure PowerShell.
one. You can also try Azure Cosmos DB for
free without an Azure subscription, without
any charges or commitments.
An Azure Cosmos DB account is the
fundamental unit of global distribution and
high availability for Azure Cosmos DB. To
globally distribute your data and throughput
across multiple Azure regions, you can add
Azure regions to your Azure Cosmos DB
account at any time. When an Azure Cosmos
DB account is associated with more than one
region, you can configure the account to have
one write region or multiple write regions (i.e.,
multi-master). Each Azure Cosmos DB account
can also be configured with a default
consistency level, which can be overridden on
an as-needed basis.
Currently, you can create a maximum of 100 Figure 3. You can create an Azure Cosmos DB
account in just a few minutes.
Azure Cosmos DB accounts under one Azure
subscription. Each Azure Cosmos DB account

10
Depending on the API selected when creating
Resource model: Databases, the Azure Cosmos DB account, container and
containers, and items item resources are projected in different ways.
For example:
Developers can start using Azure Cosmos DB
by provisioning an Azure Cosmos DB account, • With the SQL and MongoDB (document-
which includes choosing an API. Entities under oriented) APIs, containers are projected as
the Azure Cosmos DB account, called containers and collections respectively and
resources, are uniquely identified by a stable items are projected as documents for both.
and logical URI and are represented as a JSON • With the Gremlin API, containers are
document. The overall resource model of an projected as graphs and items are
app using Azure Cosmos DB is a hierarchical projected as nodes and edges. (Using the
overlay of the resources rooted under the multi-model capabilities of Azure Cosmos
Azure Cosmos DB account and can be DB, you can also query the nodes and
navigated using hyperlinks. edges as documents using the SQL API.)
An Azure Cosmos DB account manages one or • With the Table (key-value) API, containers
more databases, which in turn manage users, are projected as tables and items are
permissions, and containers. Containers are projected as rows.
schema agnostic, containing items (i.e., your
data), stored procedures, triggers, and user- • With the Cassandra (columnar) API,
defined functions (UDFs). If your Azure Cosmos containers are projected as key-spaces and
DB account is associated with multiple regions, items are projected as rows.
then the containers within it will also contain
merge procedures and conflicts.

Figure 4. The Azure Cosmos DB resource model.

The Azure Cosmos DB documentation provides more information on databases, containers, and items.

11
All physical partition management, including
Partitioning and horizontal that required to support scaling, also is fully
scalability managed by Azure Cosmos DB; when a
container meets the partitioning prerequisites,
Azure Cosmos DB containers can provide
the act of partitioning is transparent to your
virtually unlimited storage and throughput.
app, shielding you from a great deal of
They can scale from a few requests per second
complexity. The service handles the
into the millions, and from a few gigabytes of
distribution of data across physical and logical
data to several petabytes. But just how does
partitions and the routing of query requests to
Azure Cosmos DB achieve this? If you’ve ever
the right partition—without compromising the
sharded a database, you probably have an
availability, consistency, latency, or throughput
idea—including how complex it can get. With
of an Azure Cosmos DB container. Again, you
Azure Cosmos DB, the service does 99 percent
don’t need to worry about physical
of the work for you, as long as you choose a
partitioning—it’s handled at runtime
good partition key (more on this later).
automatically by the service.
Here’s how it works: In Azure Cosmos DB,
You should, however, be aware of how logical
there are two types of partitioning: physical
partitioning works in Azure Cosmos DB,
and logical. Physical partitioning, which is how
including the importance of choosing a good
Azure Cosmos DB delivers virtually unlimited
partition key at design time. Here’s why: the
storage and throughput, is built into Azure
data within a container is horizontally
Cosmos DB and is transparent to you as a
distributed and transparently managed
developer. Containers, which are logical
through the use of logical resource partitions,
resources, can span one or more physical
which are also known as a customer-specified
partition sets (composed of replicas or
partition keys, each of which is limited to 10
servers), each of which has a fixed amount of
GB. For example, in the following diagram, a
reserved, SSD-backed storage. The number of
single container has three physical partition
physical partition sets across which a
sets, each of which stores the data for one or
container is distributed is determined
more partition keys (in this example, LAX,
internally by Azure Cosmos DB based on the
AMS, and MEL). Each of the LAX, AMS, and
storage size and the throughput you’ve
MEL partition keys can’t grow beyond the
provisioned for the container.
maximum logical partition limit of 10 GB.

Figure 5. A single Azure Cosmos DB container distributed over three logical resource partitions.

12
Many partition keys can be co-located on a In selecting a partition key, you may want to
single physical partition set and are consider whether to use unique keys to add a
automatically redistributed by the service as layer of data integrity to your database. By
needed to accommodate growth in traffic, creating a unique key policy when a container
storage, or both. Because of this, you don’t is created, you ensure the uniqueness of one
need to worry about having too many or more values per partition key. When a
partition key values. In fact, having more container is created with a unique key policy,
partition key values is generally preferred to it prevents the creation of any new or updated
fewer partition key values. This is because a items with values that duplicate values
single partition key value will never span specified by the unique key constraint. For
multiple physical partition sets. example, in building a social app, you could
make the user's email address a unique key—
The Azure Cosmos DB documentation
thereby ensuring that each record has a
provides an overview of partitioning in Azure
unique email address and no new records can
Cosmos DB, as well as additional detail on
be created with duplicate email addresses.
physical and logical partitions.
You may also want to consider the use of
Choosing a good partition key synthetic partition keys. Here’s why: As stated
Choosing a good partition key is a critical above, it’s considered a best practice to have a
partition key with many distinct values, as a
design decision, as it’s really the only factor
means of ensuring that your data and
that can limit horizontal scalability. To scale
workload is distributed evenly across the items
effectively with Azure Cosmos DB, you’ll need
to pick a good partition key when you create associated with those partition key values. If
your container. You should choose a partition such a property doesn’t exist in your data, you
key such that: can construct a synthetic partition key in
several ways, such as by concatenating
• The storage distribution is even across all multiple properties of an item, appending a
the keys. random suffix (like a random number) at the
• The volume distribution of requests at a end of a partition key value, or by using a pre-
given point in time is even across all the calculated suffix based on something you
keys. want to query.

• Queries that are invoked with high


concurrency can be efficiently routed by
including the partition key in the filter
predicate.

In general, a partition key with higher


cardinality is preferred because it typically
yields better distribution and scalability. The
Azure Cosmos DB documentation provides
additional guidance for choosing a partition
key. The first webinar in our Azure Cosmos DB
Technical Training series also covers how
partitioning works and how to choose a good
partition key.

13
Request units and provisioned memory, and IOPS—a blended measure of
compute resources used, which is expressed in
throughout RUs.
Request units per second (RU/s)—often
The number of RUs for an operation is
represented in the plural form RUs—are the
deterministic. Azure Cosmos DB supports
“throughput currency” of Azure Cosmos DB.
various APIs that have different operations,
To establish the throughput you’ll need, you
ranging from simple reads and writes to
reserve a number of RUs, which are
complex graph queries. Because not all
guaranteed to be available to your app on a
requests are equal, requests are assigned a
per-second basis. As your app runs, each
normalized quantity of RUs, based on the
operation in Azure Cosmos DB (such as
amount of computation required to serve the
writing a document, performing a query, and
request.
updating a document) consumes CPU,

RUs let you scale your app’s throughput with one simple dimension, which is much easier than
separately managing CPU, memory, and IOPS. You can dynamically dial RUs up and down by using the
Azure portal or programmatically, enabling you to avoid paying for spare capacity that you don’t need.
For example, if your database traffic is heavy from 9 AM to 5 PM, you can scale up your RUs for those
hours and then scale back down for the remaining 16 hours when database traffic is light.

Figure 6. Request units are the normalized currency of throughput for various database operations.

The Azure Cosmos DB documentation provides more information on RUs, including how to specify RU
capacity, variables to take into consideration when estimating the number of RUs to reserve, and what
happens—and how to how to handle it—when your app exceeds reserved throughput.

14
Capacity planning made easy
For the developer, request units (RUs) immensely simplify capacity planning. Say you have an
on-premises database, which you’re running on a server that has a given amount of all three key
system resources: CPU, memory, and I/O. Now say you want to scale that database from 1,000
requests per second to 10,000 requests per second. How much more RAM do you buy? How
much CPU do you buy? Do you even know which is the bottleneck?
You might do some stress testing and find that RAM is indeed the bottleneck, but that doesn’t
necessarily tell you how much RAM translates to how many requests per second. Furthermore,
as soon as you add some RAM, CPU might become the bottleneck. And as you add more
processor cores, I/O might become the new bottleneck. Clearly, this approach gives you a very
difficult set of dimensions to scale against—and that’s for a single, monolithic server. Imagine
doing this for a distributed database.
Azure Cosmos DB uses a machine-learning model to provide a predictable RU charge for each
operation. So if you create a document today and it costs 5 RUs, then you can rest assured that
the same request will cost you 5 RUs tomorrow and the day after—inclusive of all background
processes. This lets you forecast required capacity with some basic “mental math,” using one
simple dimension.
For example, in the previous scenario, where you want to scale from 1,000 operations per second
to 10,000, you’ll need 10 times the number of RUs. So if it takes 5,000 RUs to support 1,000 writes
per second, you can rest assured that you can support 10,000 writes per second with 50,000 RUs.
It’s really that simple. Just provision the RUs that you’ll want, and Azure Cosmos DB will set aside
the necessary system resources for you, abstracting all the complexity in terms of CPU, memory,
and I/O.

• Data consistency. Data consistency levels


Understanding throughput like Strong or Bounded Staleness
requirements (discussed later under Consistency)
By understanding your app’s throughput consume more RUs than other consistency
requirements and the factors that affect RU levels when reading items.
charges, you can run your app as cost- • Indexed properties. An index policy on
effectively as possible. In estimating the each container determines which
number of RUs to provision, it's important to properties are indexed by default. You can
consider the following variables: reduce RU consumption for write
• Item size. As size increases, the number of operations by limiting the number of
RUs consumed to read or write the data indexed properties or by enabling lazy
also increases. indexing.

• Item property count. Assuming default • Query patterns. The complexity of a


indexing of all properties, the RUs query affects how many RUs are
consumed to write a document, node, or consumed for an operation. The number
entity increase as the property count of query results, the number of predicates,
increases. the nature of the predicates, the number
of user-defined functions, the size of the

15
source data, and projections all affect the Provisioning throughput on containers
cost of query operations. and databases
• Script usage. As with queries, stored With Azure Cosmos DB, you can provision
procedures and triggers consume RUs throughput at two granularities: at the
based on the complexity of the operations container level, and at the database level. At
being performed. As you develop your both levels, provisioned throughput can be
app, inspect the request charge header to changed at any time.
better understand how each operation
When you provision throughput at the
consumes RU capacity.
container level, the throughput you provision
One method for estimating the amount of is reserved exclusively for that container and is
reserved throughput required by your app is uniformly distributed across all of its logical
to record the RU charge associated with partitions. If the workload running on a logical
running typical operations against a partition consumes more than its share of
representative item used by your app. Then, overall provisioned throughput, your database
estimate the number of operations you operations against that logical partition are
anticipate performing each second. Be sure to rate-limited, which may result in a “429”
also measure and include typical queries and throttling exception response that includes
Azure Cosmos DB script usage. For example, the amount of time (in milliseconds) that the
these are the steps you might take: app must wait before retrying the request.
1. Record the RU charge for creating When this happens, for workloads that aren't
(inserting) a typical item. sensitive to latency, in many cases, you can
2. Record the RU charge for reading a simply let the application handle it as part of
typical item. normal operations. The native SDKs
(.NET/.NET Core, Java, Node.js and Python)
3. Record the RU charge for updating a implicitly catch this response, respect the
typical item. server-specified retry-after header, and retry
4. Record the RU charge for typical, the request. Unless your account is being
common item queries. accessed concurrently by multiple clients, the
next retry will succeed. If retries become
5. Record the RU charge for any custom
excessive, you can operationally scale the
scripts (stored procedures, triggers, or
provisioned RUs for the container to support
user-defined functions) that the app uses.
the capacity requirements of your app.
6. Calculate the required RUs given the
Provisioning throughput at the container level
estimated number of each of the above
is the most frequently used option for large-
operations you anticipate runninhg in
scale production applications that require
each second.
guaranteed performance.
There’s an Azure Cosmos DB capacity planner
When you provision throughput at the
to help you estimate your throughout needs,
database level, the throughput is shared
and an article on finding the RU consumption
across all the containers in the database.
for any operation executed against a container
Although this guarantees you’ll receive the
in Azure Cosmos DB.
provisioned throughput for that database all
the time, because all containers within the
database share the provisioned throughput, it
doesn't provide any predictable throughput

16
guarantees for any particular container. need arise, you can always adopt a per-
Instead, the portion of provisioned container provisioning model later. You can
throughput that a specific container may even combine the two throughput
receive is dependent on the number of provisioning models, in which case the
containers, choice of partition keys for those throughput provisioned at the database level
containers, and the workload distribution is shared among all containers in the database
across the various logical partitions of those for which throughput is not explicitly
containers. provisioned at the container level.
Provisioning throughput at the database level The Azure Cosmos DB documentation
is a good starting point for development includes how-to guides for provisioning
efforts and small applications. Should the throughput at the container level and
at the database level.

17
When you elastically scale throughput or
Global distribution storage for a globally distributed database,
As illustrated in the following diagram, with Azure Cosmos DB transparently performs the
Azure Cosmos DB, a customer’s resources can necessary partition management operations
be distributed along two dimensions. Within a across all the regions, continuing to provide a
given region, all resources are horizontally single system image, independent of scale,
partitioned using resource partitions—called distribution, or any failures.
local distribution. If you’ve set up more than Azure Cosmos DB supports both explicit and
one region, each resource partition is also policy-driven failovers, allowing you to control
replicated across geographical regions—called the end-to-end system behavior in the event
global distribution. of failures. In the rare event of an Azure
regional outage or datacenter outage, Azure
Cosmos DB automatically triggers failovers of
all Azure Cosmos DB accounts with a presence
in the affected region. You can also manually
trigger a failover, as may be required to
validate the end-to-end availability of your
app. Because both the safety and liveness
properties of the failure detection and leader
election are guaranteed, Azure Cosmos DB
guarantees zero data-loss for a tenant-
initiated, manual-failover operation. (Azure
Cosmos DB guarantees an upper bound on
data loss for a system-triggered, automatic
failover due to a regional disaster.)
Upon regional failover, you won’t need to
Figure 7. A container can be both locally and
redeploy your app. Azure Cosmos DB lets your
globally distributed.
app interact with resources using either logical
Global distribution of resources in Azure (region-agnostic) or physical (region-specific)
Cosmos DB is turnkey. At any time, outside of endpoints—the former ensuring that your app
geo-fencing restrictions (for example, China can transparently be multihomed in case of
and Germany), with a few button clicks or a failover, and the latter providing fine-grained
single API call, you can associate any number control for the app to redirect reads and
of geographical regions with your Azure writes to specific regions.
Cosmos DB account. Regardless of the
With a multi-region account, Azure Cosmos
amount of data or number of regions, Azure
DB guarantees 99.999 percent availability,
Cosmos DB guarantees (at the ninety-ninth
regardless of data volumes, specified
percentile) that each newly associated region
will start processing client requests within 30 throughput, the number of regions associated
minutes for up to 100 TB of data, which is with your Azure Cosmos DB account, or the
achieved by parallelizing the seeding and distance between the geographical regions
copying of data. You can also remove an associated with your database. The same
holds true for SLAs around consistency,
existing region or take a region that was
availability, and throughput.
previously associated with your Azure Cosmos
DB account “offline.”

18
Multi-master replication
Azure Cosmos DB supports multi-master
replication, meaning that it supports multi-
region writes. When you configure multi-
region writes, you can write data to any region
associated with your Azure Cosmos DB
account and have those updates propagate
asynchronously, enabling you to seamlessly
scale both write and read throughput around
the world—with single-digit millisecond write
Figure 8. You can configure global distribution latencies at the ninety-ninth percentile and
using the Azure portal, with just a few clicks. 99.999 percent write availability (compared to
99.99 percent write availability for single-
Clearly, the global distribution enabled by
region writes). To use the multi-master feature
Azure Cosmos DB can help ensure high
in your application, you’ll need to enable
availability. But there’s a second very good
multi-region writes and configure the multi-
reason to use it: app responsiveness. You can’t
homing capability in Azure Cosmos DB.
break the speed of light, which means
requesting data that’s stored halfway across When you use multi-region writes, you’ll need
the globe is going to take a lot longer than to take into account update conflicts, which
data that resides significantly closer to you. can occur when the same item is concurrently
Even under ideal network conditions, sending updated in multiple regions. Azure Cosmos DB
a packet halfway across the globe can take provides a flexible means of dealing with such
hundreds of milliseconds. conflicts, allowing you to choose from two
resolution policies:
However, you can cheat the speed of light by
using data locality, taking advantage of Azure • Last write wins. By default, this conflict
Cosmos DB global replication to put copies of resolution policy uses a system-defined
your data in strategic locations that are close timestamp property based on the time-
to your users. Content delivery networks synchronization clock protocol. If you’re
(CDNs) have employed this approach using the SQL API, you also can specify
successfully for years when it comes to static any other custom numerical property (e.g.,
content; now Azure Cosmos DB lets you do your own notion of a timestamp) to be
the same thing for dynamic content. Through used for conflict resolution—sometimes
such an approach, you can often achieve data referred to as the conflict resolution path.
retrieval times that are lower than 10 • Custom. This resolution policy, which is
milliseconds, or an order of magnitude available only for SQL API accounts,
reduction. And if your app is making several supports application-defined semantics
round trips to the database, that can mean a for conflict reconciliation. When you set
big difference in the user experience. this policy, you’ll also need to register a
The Azure Cosmos DB documentation merge stored procedure, which is
provides more information on global automatically invoked when conflicts are
distribution, including how to associate your detected. If you don’t register a merge
Azure Cosmos DB account with any number of procedure on the container or the merge
regions using the Azure portal or the Azure procedure throws an exception at runtime
Cosmos DB resource provider’s REST APIs. (Azure Cosmos DB provides exactly once
guarantee for the execution of a merge

19
procedure as part of the commitment Assuming that a Cosmos container is
protocol), the conflicts are written to the configured with 'R' RUs and there are 'N'
conflicts feed for manual resolution by regions associated with the Cosmos account,
your application. then:
The Azure Cosmos DB documentation • If the Azure Cosmos DB account is
provides more information on conflict types configured with a single write region, the
and resolution policies, as well as a how-to total RUs available globally on the
article on managing conflict resolution container = R x N.
policies.
• If the Azure Cosmos DB account is
configured with multiple write regions, the
How provisioned throughout is
total RUs available globally on the
distributed across multiple regions container = R x (N+1). The additional R
Earlier in this e-book, we discussed how RUs (i.e., the “+1” part) are automatically
provisioned throughput is expressed in RU/s, provisioned to process cross-region
or RUs, which measure the “cost” of both read update conflicts and anti-entropy traffic.
and write operations against an Azure Cosmos
It’s also worth noting that your choice of
DB container. If you provision 'R' RUs on a
consistency level (discussed next) also affects
container (or database), Azure Cosmos DB
throughput. In general, you’ll get greater read
ensures that 'R' RUs are available in each
throughout for the more relaxed consistency
region associated with your Azure Cosmos DB
levels (e.g., session, consistent prefix and
account. Each time you add a new region to
eventual consistency) compared to stronger
your account, Azure Cosmos DB automatically
consistency levels (e.g., bounded staleness or
provisions 'R' RUs in the newly added region.
strong consistency).

20
driving the development of globally
Consistency distributed apps.
Azure Cosmos DB makes using geographic For example, while the strong consistency
distribution to build low-latency global apps level is the gold standard of programmability,
not only possible, but also easy. As a write- it comes at the steep price of much higher
optimized database, it offers multiple, latency (in steady state), reduced availability
intuitive, tunable consistency levels to give (in the face of failures), and lower read
you read predictability, let you make the right scalability. Conversely, with eventual
tradeoffs between consistency and latency, concurrency, you’ll get great performance, but
and help you correctly implement your a complete lack of predictability when it
geographically distributed app. But why is comes to whether you’re getting the latest
consistency so important, and why are the five and greatest data. Despite an abundance of
consistency levels provided by Azure Cosmos research and proposals, the distributed
DB better than what you can get with any database community has not been able to
other NoSQL database? commercialize consistency levels beyond
Most commercially available distributed strong and eventual consistency—that is, until
database fall into two categories: either they now.
don’t offer well-defined, provable consistency Azure Cosmos DB allows you to choose from
choices, or they offer only the two extremes: five well-defined consistency levels, spanning
strong concurrency, or eventual concurrency. the spectrum from strong to eventual with
Systems that fall into the first category burden three intermediate levels: bounded-staleness,
developers with the minutiae of replication session, and consistent prefix. You can specify
protocols and make difficult tradeoffs the default consistency level for an Azure
between consistency, availability, latency, and Cosmos DB account, which will apply to all
throughput. Systems in the second category data within all partition sets across all regions.
force developers to choose between the two If you want, you can override the default
extremes, neither of which is optimally suited consistency level on a per-request basis.
for many of the real-world scenarios that are

Figure 9. Azure Cosmos DB offers five well-defined consistency levels, enabling you to fine-tune the tradeoffs
between consistency and latency for your app.

21
The following table captures the guarantees and characteristics associated with each consistency level.

Consistency Guarantees and characteristics


level

Strong • Offers a linearizability guarantee, with reads guaranteed to return the most
recent version of an item.
• Guarantees that a write is only visible after it’s committed durably by the
majority quorum of replicas. A write is either synchronously committed
durably by both the primary and the quorum of secondaries, or it’s
abandoned. A read is always acknowledged by the majority read quorum; a
client can never see an uncommitted or partial write and is always
guaranteed to read the latest acknowledged write.
• The cost of a read operation (in terms of request units consumed) with
strong consistency is higher than session and eventual, but the same as
bounded staleness.

Bounded • Guarantees that the reads may lag behind writes by at most K versions or
staleness prefixes of an item or t time interval. (When choosing bounded staleness,
the “staleness” can be configured in two ways: number of versions K of the
item by which the reads lag behind the writes, and the time interval t.)
• Offers total global order except within the “staleness window.” The
monotonic read guarantees exist within a region both inside and outside
the “staleness window.”
• Provides a stronger consistency guarantee than session, consistent-prefix,
or eventual consistency. For globally distributed apps, we recommend that
you use bounded staleness for scenarios where you would like to have
strong consistency but also want 99.999 percent availability and low
latency.
• The cost of a read operation (in terms of RUs consumed) with bounded
staleness is higher than session and eventual consistency, but the same as
strong consistency.

Session • Unlike the global consistency offered by strong and bounded-staleness


consistency levels, session consistency is scoped to a client session.
• Ideal for all scenarios where a device or user session is involved because it
guarantees monotonic reads, monotonic writes, and read your own writes
(RYW) guarantees.
• Provides predictable consistency for a session and maximum read
throughput while offering the lowest latency writes and reads.
• The cost of a read operation (in terms of RUs consumed) is less than strong
and bounded staleness but more than eventual consistency.

22
Consistent • Guarantees that in the absence of any further writes, the replicas within the
prefix group eventually converge.
• Guarantees that reads never see out-of-order writes. If writes were
performed in the order A, B, C, then a client sees either A; A, B; or A, B, C,
but never an order like A, C or B, A, C.

Eventual • Guarantees that in the absence of any further writes, the replicas within the
group eventually converge.
• Is the weakest form of consistency, where a client might get values that are
older than the ones it had seen before.
• Provides the weakest read consistency but offers the lowest latency for
both reads and writes.
• The cost of a read operation (in terms of RUs consumed) is the lowest of all
the Azure Cosmos DB consistency levels.

All consistency levels are supported by our • Choosing the right consistency level based
consistency SLAs. To report any violations, we on which API you’re using, as well as
employ a linearizability checker, which practical considerations related to
continuously operates over our service consistency guarantees.
telemetry. For bounded staleness, we monitor
• Mapping between Apache Cassandra or
and report any violations to K and t bounds.
MongoDB and Azure Cosmos DB
For all four relaxed consistency levels, we track
consistency levels (when using SQL API,
and report the probabilistic bounded-
Gremlin API, and Table API, the default
staleness (PBS) metric among other metrics.
consistency level configured on the Azure
The Azure Cosmos DB documentation Cosmos DB account is used.)
provides more information on consistency
• Consistency, availability, and performance
levels, including how to configure the default
tradeoffs—including those between
consistency level for an Azure Cosmos DB
consistency levels and latency, consistency
account, guarantees associated with
levels and throughput, and consistency
consistency levels, and consistency levels
levels and data durability.
explained through an example based on
baseball scores. The documentation also
includes articles on:

23
Choosing a consistency level
The following scenarios illustrate when each consistency level might be appropriate:
• Strong consistency ensures that you’ll never see a stale read, making it a good fit for
scenarios like transaction processing, such as updating the state of an order.
• Bounded-staleness gives you an SLA on “how eventual is eventual?” You can think of this as
a window within which stale reads are possible, which you can configure in terms of time or
number of operations. Outside of this window, strong consistency is guaranteed.
• Session consistency is the sweet spot for most apps; it provides a means of scoping strong
consistency down to a single session, without paying the performance penalty associated with
global strong consistency. Take the case of a user posting a comment on Facebook: when the
page is refreshed, if the user doesn’t see his or her post, that user might repeat the process—
only to see multiple copies. With session concurrency, where reads follow your own writes
within a session, you can avoid this.
• Consistent prefix is good when you can handle some latency, as long as you’ll never see out-
of-order updates. A group chat app is a good example. Say you have Alice and Bob organizing
dinner, and saying “What time should we meet? How about 7:00? I’m busy then. How about
8:00? That’s great—let’s meet then.” If Carol and Dan are also part of the group chat and see
these messages out of order, they might not arrive at the proper time. With consistent prefix,
you can ensure that the messages arrive in the correct order.

• Eventual concurrency is a good choice where low latency matters above all else. Again, let’s
use the example of a Facebook post. You want it to load quickly and aren’t very concerned
whether the number of “Likes” takes into account every Like by every user up to that moment
in time, around the world.
About 73 percent of Azure Cosmos DB tenants use session consistency, while 20 percent prefer
bounded staleness. Also, approximately 3 percent of customers initially experiment with various
consistency levels before settling on a choice for their app, and only 2 percent of customers
override consistency levels on a per-request basis.

For more information on consistency levels in Azure Cosmos DB, check out the interactive e-book.

Automatic indexing
Azure Cosmos DB provides automatic indexing, which you can tune and configure. It works across
every data model, automatically indexing every property of every record by default. You won’t need
to define schemas and indexes up front or manage them over time, so you’ll never have to do an alter
table or create index operation. Automatic indexing is based on a latch-free data structure and is
designed to run on the Azure Cosmos DB write-optimized database engine, enabling automatic
indexing while sustaining high data-ingestion rates.
Automatic indexing is made possible through the use of an inverted index. Here’s how it works, using
the two JSON documents in Figures 10 and 11 as an example. Note that the two records have different
schemas. Document 1 has a set of exports that have only a city property. Document 2, on the other
hand, has a set of exports where some of the cities also have a set of dealers.

24
Figure 10. Sample JSON document 1.

Figure 11. Sample JSON document 2.

Behind the scenes, using its ARS-based data model, Azure Cosmos DB models these records as trees:
the root node is the document ID (or record ID), the properties under the root become the child nodes,
and the instance values become the leaf nodes. The result is two different trees, each representing a
different record. This schema never goes away and is always defined at a record level.
By merging these trees into an inverted index, we can establish pointers to the actual underlying set
of records for our query results, as shown in Figure 12.

25
Figure 12. Inverted index based on sample documents 1 and 2.

Now, say we run a query on this container to By default, Azure Cosmos DB indexes
find all records where the country location is everything. However, you can specify a set of
Germany. By traversing the left side of the included paths and a set of excluded paths.
tree, we find that Germany has a pointer to So, we include the path /*. More specific paths
records 1 and 2 and can efficiently return will override entries with less specific paths, so
records 1 and 2 in the query result set. you can always include /* and exclude a set of
Similarly, by traversing the right side of the paths that you know you’ll never need to
tree, we can determine that a dealer with the query on—essentially, taking an opt-out
name Hans is unique to record 2. approach. If you want to take a more
traditional approach to indexing, you can
One nice thing about this approach is that,
define a set of paths to be included.
because most of the paths or properties in a
set of records will have a high degree of The Azure Cosmos DB documentation
commonality, we can automatically index provides more information on indexing in
every property of every record while still Azure Cosmos DB and working with indexing
achieving high compression rates to minimize policies.
storage overhead.

26
container as a whole, or for each logical
Architectural partition key within the container. Change
feed output can be distributed across one or

considerations more consumers for parallel processing.

Change feed Following are just a few of the ways you can
A common design pattern in many put change feed to use:
applications is to use changes to the data to • Triggering a notification or a call to an API,
trigger additional actions, with IoT, gaming, when an item is inserted or updated.
retail, and operational logging applications all
• Real-time stream processing for IoT or
being good examples. Change feed support in
real-time analytics processing on
Azure Cosmos DB makes building such apps
operational data.
easy. It works by listening to an Azure Cosmos
DB container for any changes and provides • Additional data movement by either
them as a sorted list of documents that were synchronizing with a cache or a search
changed, in the order in which they were engine or a data warehouse or archiving
modified. Change feed is available for the data to cold storage.

Figure 13. Azure Cosmos DB change feed support makes it easy to build applications that use changes to data
to trigger additional actions.

You can work with change feed by using any Functions application, the Azure Function
of the following options. gets triggered whenever there’s a change
to the specified container. Building
• Azure Functions. This is the simplest (and
serverless apps, the next topic in this e-
recommended) option. When you create
book, discusses using Azure Functions
an Azure Cosmos DB trigger in an Azure

27
together with Azure Cosmos DB in greater milliseconds, they run for. At the core of
detail. (NOTE: Currently, the Azure Cosmos serverless computing are functions, which are
DB trigger is supported for use with the made available in the Azure ecosystem by
core SQL API only. For all other Azure Azure Functions. With the native integration
Cosmos DB APIs, you should access the between Azure Cosmos DB and Azure
database from your function by using the Functions, you can create database triggers,
static client for your API.) input bindings, and output bindings directly
from your Azure Cosmos DB account—making
• Azure Cosmos DB SQL API SDK. The
it easy to create and deploy event-driven
Azure Cosmos DB change feed processor
serverless apps with low-latency access to rich
library within the Azure Cosmos DB SQL
data for a global user base.
API SDK gives you complete, low-level
control of the change feed while shielding Azure Cosmos DB and Azure Functions let you
you from excess complexity. It follows the integrate your databases and serverless apps
observer pattern, where your processing in the following ways:
function is called by the library. If you have
• You can create an event-driven Azure
a high-throughput change feed or want to
Cosmos DB trigger in Azure Functions.
distribute event processing across multiple
This trigger relies on change-feed streams
consumers for other reasons, you can use
to monitor your Azure Cosmos DB
the change feed processor library to
container for changes. When any changes
automatically divide the load among the
are made to a container, the change-feed
different clients—without you having to
stream is sent to the trigger, which
write that code. If you want to build your
invokes the function.
own load balancer, you can use the
change feed processor library to • Alternatively, you can bind a function to
implement a custom partition strategy. an Azure Cosmos DB container using an
The SDK can be downloaded here. (Note input binding, which reads data from a
the drop-down menu at the top of the container when a function executes.
page, which provides access to SDKs for • You can also bind a function to an Azure
other languages.) Cosmos DB container using an output
The Azure Cosmos DB documentation binding, which writes data to a container
provides more information on change feed, when a function completes.
including how it works with different The Azure Cosmos DB documentation
operations, use cases and scenarios, ways to provides more information on using Azure
work with change feed, and key features of Functions to integrate your databases and
change feed. serverless apps and on Azure Functions
bindings.
Building serverless apps with
Azure Cosmos DB and Azure
Functions
Serverless computing is all about the ability to
focus on individual pieces of logic that are
repeatable and stateless; they require no
infrastructure management and they consume
resources only for the seconds, or

28
Server-side programming
Azure Cosmos DB supports language-
integrated, transactional execution of
JavaScript when using the SQL API in Azure
Cosmos DB. This allows you to write stored
procedures, triggers, and user-defined
functions (UDFs) in JavaScript, then have them
execute within the database engine. You can
create and execute triggers, stored
procedures, and UDFs by using the Azure
portal, the JavaScript query API in Azure
Cosmos DB, or the Azure Cosmos DB SQL API
client SDKs.
Stored procedures and triggers provide a
means of executing multi-document
transactions—a sequence of operations
performed as a single logical unit of work. In
Azure Cosmos DB, the JavaScript runtime is
hosted inside the database engine, which
means requests made within the stored
procedures and the triggers execute in the
same scope as the database session. This
enables Azure Cosmos DB to guarantee ACID
properties for all operations that are part of a
stored procedure or a trigger.
The Azure Cosmos DB documentation
provides more information on server-side
programming with Azure Cosmos DB,
including a discussion of benefits and
additional detail on transactions, bounded
execution, triggers, user-defined functions,
and the JavaScript language integrated query
API. There’s a separate article that includes
supported JavaScript functions in the
JavaScript query API and a SQL to JavaScript
cheat sheet.

29
Apache Spark to Azure Cosmos in limited preview, so if you want to try it out,
you’ll need to sign up for it here.
DB Connector
The Apache Spark to Azure Cosmos DB
Connector lets you run Spark jobs on the data
stored in Azure Cosmos DB. You can use the
connector with Azure Databricks, Azure
HDInsight, which provide managed Spark
clusters on Azure. You can also use it with
your own Spark deployment. The Apache
Spark to Azure Cosmos DB Connector
provides a low-latency data source for Spark
that works for both batch and stream
processing.

Built-in operational analytics


with Apache Spark (in preview)
More recently, we announced a limited
preview of built-in operational analytics in
Azure Cosmos DB using Apache Spark. This
allows you to run analytics from Apache Spark
against data stored in an Azure Cosmos
account without a connector, instead providing
native support for Apache Spark jobs within
Azure Cosmos DB. Capabilities also include
built-in support for Jupyter notebooks, which
run within Azure Cosmos DB accounts.
Built-in support for Apache Spark in Azure
Cosmos DB will provide several advantages,
beginning with the fastest time to insight for
geographically distributed users and data. You
can also simplify your analytics architecture
and lower its TCO, as the system will have the
least number of data processing components
and avoid any unnecessary data movement
among them. Scalability will be built-in, and
you’ll have a security, compliance, and
auditing boundary that encompasses all the
data under management. Finally, you’ll be able
to deliver highly available analytics backed by
stringent SLAs.

The Azure Cosmos DB documentation


provides more information on its built-in
support for Apache Spark. Again, it’s currently

30
procedures, triggers, and UDFs) are
Operational secured using resource tokens.

considerations
• IP firewall. By default, an Azure Cosmos
DB account is accessible from the internet,
as long as the request is accompanied by
Cost optimization with Azure a valid authorization token. Configurable
IP-based access controls in Azure Cosmos
Cosmos DB DB provide an additional layer of security,
The pricing model for Azure Cosmos DB enabling access only from approved
simplifies cost management and planning, in machines and/or cloud services (which still
that you pay only for the throughout you’ve need a valid authorization token).
provisioned (in RUs) and the storage that you
• Access from virtual networks. You can
consume. It’s just one of the many reasons
configure an Azure Cosmos DB account to
why Azure Cosmos DB delivers such a
allow access only from specified a specific
compelling total cost of ownership (TCO).
subnet of a virtual network (Vnet). When
That said, just because Azure Cosmos DB you do this, only requests originating from
delivers a great TCO, it doesn’t mean that you those subnets will get a valid response;
shouldn’t try to get the very most out of the requests originating from any other
resources you’re paying for. The Azure source will receive a 403 (Forbidden)
Cosmos DB documentation includes response.
numerous articles to help you optimize TCO—
• Role-based access control. Azure
from understanding your bill to optimizing the
Cosmos DB provides built-in role-based
cost of provisioned throughput. You’ll also
access control (RBAC) for common
find articles on optimizing costs in relation to
management scenarios. An individual with
queries, storage, reads and writes, geographic
a profile in Azure Active Directory can
distribution, development/test, and reserved
grant or deny access to resources (and
capacity.
operations on Azure Cosmos DB
resources) by assigning these RBAC roles
Security to users, groups, service principals, or
Azure Cosmos DB includes numerous features managed identities. Role assignments are
and capabilities designed to help you prevent, scoped to control-plane access only,
detect, and respond to database breaches. which includes access to Azure Cosmos
That said, there are a few worth calling out accounts, databases, containers, and offers
here: (throughput).
• Data encryption. All data is encrypted at
rest and during transport, by default and Online backup and restore
at no additional cost. Azure Cosmos DB automatically takes backups
• Secure access. With Azure Cosmos DB, of your data at regular intervals, which is done
data access is secured in several ways. without affecting the performance or
Administrative resources (Azure Cosmos availability of database operations. All
DB accounts, databases, users, and backups are stored separately in Azure Blog
permissions) are secured using master storage, with those backups geographically
keys. Application resources (containers, replicated to protect against regional
documents, attachments, stored disasters. These automatic backups can be

31
helpful if you accidentally delete or update account as the live account, it’s not a
your Azure Cosmos account, database, or recommended option for production
container and need to recover that data. workloads.)
Azure Cosmos DB takes snapshots of your
data every four hours. At any given time, only Compliance
the last two snapshots are retained. However, To help customers meet their own compliance
if a container or database is deleted, Azure obligations across regulated industries and
Cosmos DB retains existing snapshots of that markets worldwide, Azure maintains the
container or database for 30 days. largest compliance portfolio in the industry in
terms of both breadth (total number of
With Azure Cosmos DB SQL API accounts, you
offerings) and depth (number of customer-
can also maintain and manage your own
facing services in assessment scope). These
backups. You can use Azure Data Factory to
compliance offerings are grouped into four
periodically output any data to any Azure Data
segments (globally applicable, US
Factory-supported storage destination, or you
Government, industry specific, and region or
can use the Azure Cosmos DB change feed to
country/region specific) and are based on
read data periodically (for full backups and/or
various types of assurances, including formal
incremental changes) and store that data in an
certifications, attestations, validations,
Azure Blob storage account.
authorizations, and assessments produced by
The Azure Cosmos DB documentation independent third-party auditing firms, as well
provides more information on online backup as contractual amendments, self-assessments,
and restore, including options to manage your and customer guidance documents produced
own backups, backup retention, restoring data by Microsoft. The Azure Cosmos DB
from online backups, and migrating restored documentation provides a comprehensive list
data to the original Azure Cosmos DB account. of compliance certifications.
(Although it’s possible to use the restored

32
Building an app with Azure Cosmos DB
Choosing the right API
Short on time? Try a
It’s easy to get up and running with Azure Cosmos DB. If you haven’t
5-minute quickstart
used it before, we provide a wealth of getting started resources on the
following pages. Either way, the first thing you’ll need to do is choose Our 5-minute
an API; each Azure Cosmos DB account supports one API, which you’ll quickstarts can help you
need to specify when creating an account, and each tutorial is specific get started with Azure
to an API. Read on for advice on when to consider each API, followed Cosmos DB in the time
by how to get started with the one you choose. it takes to grab a cup of
coffee. They’re
organized by API and
programming language,
so you shouldn’t have
trouble finding one that
sparks your interest.
SQL API: .NET, .NET
Preview, Java, Node.js,
Python, Xamarin

MongoDB API: .NET,


Java, Node.js, Python,
Xamarin, Golang,
Gremlin API: .NET,
Gremlin console, Java,
Node.js, Python, PHP
Table API: .NET, Java,
Node.js, Python

Cassandra API: .NET,


Node.js, Java, Python
Figure 14. When you create an Azure Cosmos DB account, you’ll need to
choose an API.

The Azure Cosmos DB database engine


One database, multiple APIs translates and projects supported data models
Azure Cosmos DB natively supports multiple onto this core ARS-based data model, which is
data models and APIs, which we’re continuing accessible from dynamically typed
to add. It does this through an atom-record- programming languages and can be exposed
sequence (ARS) based core type system, as is using JSON or other similar
where atoms consist of a small set of primitive representations. The same design also enables
types (such as string, bool, and number), native support for multiple APIs—enabling
records are structs, and sequences are arrays developers to build their apps using popular
consisting of atoms, records, or sequences. open-source APIs with all the benefits of an

33
enterprise-grade, fully managed, globally edges) can also have properties. With a
distributed database system. graph database, you can store data once
and then interpret the relationships within
Choosing an API it in different ways very quickly and
efficiently because the relationship
At this point, you might be thinking, “Multiple
APIs give me options, but which do I choose?” between nodes is persisted instead of
The answer depends on your data, and what calculated at query time.
you want to do with it. Here are some general • Column family databases (supported by
guidelines: the Cassandra API) store data in groups
of related information that that are often
• Document databases (supported by the
SQL and MongoDB APIs) store and accessed together—such as a customer
name, street address, and postal code. A
retrieve documents, which are typically
blocks of XML or JSON. Documents are column family is a set of rows (each with a
self-describing (containing a description of row key) and an associated set of columns.
the data type and a value for that Various rows in the column family don’t
description) and employ a hierarchical, need to contain the same columns, and
columns can be added to one row without
tree-based structure, and they all don’t
have to be the same. At a high level, you having to add them to other rows.
can think of a document database as a • Key-value databases (supported via the
key-value database where the value part Table API) are relatively simple to use.
(the document) can be examined via a From an API perspective, the database
query language. client can either Put, Get, or Delete values
• Graph databases (supported by the by keys. The value is a blob; the database
Gremlin API) allow you to store entities doesn’t care what’s inside it. They’re good
for storing semi-structured data, providing
and the relationships between them.
a flexible data schema where the
Entities (aka nodes) can have properties,
information in different rows can have
just like the instance of an object in an
different structures.
app. Relations with other nodes (aka

With five APIs to choose from, odds are that you’ll have one optimized to the task at hand. For example,
key-value databases are great for doing lookups with known values such as state code or postal code,
or even for pulling user preferences for a signed-in user. Similarly, column databases are highly efficient
at projections—that is, obtaining a few properties from a document that has many. Graph databases
are helpful for visualizing both connected or networked data sets and hierarchical data relationships—
family trees and flights between airports are good examples.

Get started with a free Azure subscription


To get started with Azure Cosmos DB, you need a Microsoft Azure subscription. If you don’t have
one, you can sign up for a free Azure subscription in just a few minutes. You can also try Azure
Cosmos DB for free without an Azure subscription, free of any charges or commitments.

34
Getting Started with read-heavy or write-heavy? Data modeling in
Azure Cosmos DB provides a good overview—
the SQL API and some general rules-of-thumb—when it
comes to modeling document data.
If you’ve chosen the SQL API, then you’ll likely
be modeling document data. And while Quickstarts
schema-free databases like Azure Cosmos DB Our 5-minute quickstarts can help you get
make it easy to change your data model, that
started with the Azure Cosmos DB SQL API in
doesn’t mean you shouldn’t spend some time
the time it takes to walk down the hall for a
thinking about it first. In doing so, you may
cup of coffee. They’re organized by
want to ask yourself some basic questions:
programming language, so you shouldn’t
How will my data be stored? How will my app
have trouble finding one that sparks your
retrieve and query that data, and is the app
interest.

Language Scenario

.NET Create an Azure Cosmos DB SQL API account, document database, and container
using the Azure portal, and then build and deploy a web app built on the SQL .NET
API.

Java Create an Azure Cosmos DB SQL API account using the Azure portal, and then
create a Java app using the SQL Java SDK and add resources to your Azure Cosmos
DB account by using the Java application.

Node.js Create an Azure Cosmos DB SQL API account, document database, and container
using the Azure portal, and then build and run a console app built on the SQL
JavaScript SDK.

Python Create an Azure Cosmos DB SQL API account, document database, and container
using the Azure portal, and then build and run a console app built with the Python
SDK for SQL API.

Xamarin Create an Azure Cosmos DB SQL API account, document database, and container
using the Azure portal, and then build and deploy a web app built on the SQL .NET
API and Xamarin utilizing Xamarin.Forms and the MVVM architectural pattern.

you’re starting from scratch and include


Tutorials instructions for creating an account. All of the
When you’re ready to dive deeper into the other tutorials assume you already have an
SQL API, the following tutorials are a great account.
place to start. The tutorials listed under
Creating an Azure Cosmos DB account assume

35
Creating an Azure Cosmos DB account Azure portal, and then connect to a preferred
region using the SQL API.
The first thing you’ll need to do is to create
Azure Cosmos DB account and then a
container. After that, you can connect to your How-to guides
new database and start using it. Any of the The Azure Cosmos DB documentation
following tutorials will walk you through these includes how-to guides for common SQL API-
basic concepts: related tasks, such as tuning query
performance, server-side programming, and
• Build a console app using .NET, Java,
working with DateTime and geospatial data
Async Java, or Node.js
types. There are dozens of non-API-specific
• Build a web app using .NET, Java, Node.js, guides in the Azure Cosmos DB
or Xamarin documentation, too, with the first one starting
here and the rest listed below it in the left-
Importing data hand navigation pane.
After you’ve familiarized yourself with Azure
Cosmos DB, you might want to try migrating Additional resources
some of your own data into it. You can easily
Here are some additional resources that you
do this using the Data Migration tool, an
may find useful:
open-source solution that imports data to
Azure Cosmos DB from a variety of sources. • Sample applications that show you how
Migrate your data to Azure Cosmos DB walks to work with Azure Cosmos DB, including
you through installing the Data Migration performing CRUD operations and other
tool, importing data from different data common operations. They’re organized by
sources, and exporting from Azure Cosmos DB language: .NET, Java, Async Java, Node.js,
to JSON. You can also import data Python, PowerShell, Azure CLI, and Azure
programmatically using the Azure Cosmos DB Resource Manager.
bulk executor library. • Release notes and references for various
SDKs, libraries, resource providers, and so
Querying data on. There are too many to list in this
The Azure Cosmos DB SQL API supports document, but you can find the first one
querying documents using SQL. Querying data here, with rest listed below it in the left-
using the SQL API covers how to do this, hand navigation pane.
including a sample document and two sample
• Hands-on experience working with Azure
SQL queries.
Cosmos DB using the SQL API, JavaScript,
and the .NET Core SDK, which you can find
Distributing data globally
at the Azure Cosmos DB workshop on
After you have some data in your database, GitHub.
you might want to distribute it to additional
Azure regions. Set up global distribution using • Videos on the Azure Cosmos DB YouTube
the SQL API shows you how to set up Azure channel, which cover a broad range of
Cosmos DB global distribution using the topics.

36
from using Apache Cassandra to using the
Getting Started with Azure Cosmos DB Cassandra API by merely
changing a connection string—and realize all

the Cassandra API the benefits that using Azure Cosmos DB


provides.
Like all supported APIs in Azure Cosmos DB, You can communicate with the Azure Cosmos
the Cassandra API is based on a native wire DB Cassandra API through Cassandra Query
protocol implementation—that is, an Language (CQL) v4 wire protocol compliant
implementation that does not use any open-source Cassandra client drivers. Our
Cassandra source code. This lets you easily online documentation provides more
migrate your Cassandra apps to Azure information on supported CQL commands,
Cosmos DB while preserving significant tools, limitations, and exceptions.
portions of your application logic, enables you
to keep your apps portable, and lets you Quickstarts
continue to remain cloud vendor-agnostic.
Our 5-minute quickstarts can help you get
The Azure Cosmos DB Cassandra API lets you started with the Azure Cosmos DB MongoDB
interact with data stored in Azure Cosmos DB API in the time it takes to walk down the hall
using the Cassandra Query Language (CQL), for a cup of coffee. They’re organized by
Cassandra-based tools (like cqlsh), and programming language, so you shouldn’t
Cassandra client drivers that you’re already have trouble finding one that sparks your
familiar with. In many cases, you can switch interest.

Language Scenario

.NET Create an Azure Cosmos DB Cassandra API account, then build and review a
sample .NET app by cloning an example from GitHub.

Java Create an Azure Cosmos DB Cassandra API account, then build and review a
sample Java app by cloning an example from GitHub.

Node.js Create an Azure Cosmos DB Cassandra API account, then build and review a
sample Node.js app by cloning an example from GitHub.

Python Create an Azure Cosmos DB Cassandra API account, then build and review a
sample Python app by cloning an example from GitHub.

to compete them in order. The tutorial on


Tutorials migrating data is self-contained.
When you’re ready to dive deeper into the
Cassandra API, the following tutorials are a
great place to start. The first three tutorials
listed below build on each other, so make sure

37
Creating an Azure Cosmos DB account comprehensive analytics stack; Cassandra
and managing data stores the data, and Spark handles the data
processing, including in-memory analytics.
Creating a Cassandra API account in Azure
You can do the same thing on Azure—by
Cosmos DB describes how to create an Azure
using the Azure Cosmos DB Cassandra API as
Cosmos DB Cassandra API account, and then
a data store and either Azure Databricks or
use a Java sample project hosted on GitHub to
Azure HDInsight for the analytics. Connecting
create a project and dependencies, add a
to the Azure Cosmos DB Cassandra API from
database and a table, and run the sample Java
Spark provides guidance on how to do this—
application.
including connectivity dependencies, Spark
connector throughout configuration
Loading data
parameters, Data Definition Language (DDL)
Next, you might want to try importing some operations, basic Data Manipulation Language
of your own data. Loading sample data into a (DML) operations, and more.
Cassandra API table in Azure Cosmos DB
shows you how to load sample user data to a
How-to guides
table in a Cassandra API account in Azure
Cosmos DB by using a Java application. The Azure Cosmos DB documentation
includes several how-to guides for common
Querying data Cassandra API-related tasks, such as using
Spring Data with Azure Cosmos DB,
After you’ve imported your data, you’ll be
connecting to the Azure Cosmos DB
ready for our tutorial on querying data by
Cassandra API from Spark, managing Azure
using the Cassandra API.
Cosmos DB Cassandra API resources using
Azure Resource Manager templates, and
Migrating Data
Azure PowerShell samples for the Azure
If you have existing Cassandra workloads that
Cosmos DB Cassandra API. There are dozens
are running on-premises or in the cloud, and
of non-API-specific guides in the Azure
you want to migrate them to Azure, then you
Cosmos DB documentation, too, with the first
may want to peruse this tutorial on migrating
one starting here and the rest listed below it
data to an Azure Cosmos DB Cassandra API
in the left-hand navigation pane.
account. It covers your various options, which
include using the cqlsh COPY command or There are also several videos on the Azure
using Apache Spark. Cosmos DB YouTube channel, which cover a
broad range of topics that you may find
Cassandra and Spark useful.

Apache Cassandra is often used together with


Apache Spark as components of a

38
lets you continue to remain cloud vendor
Getting Started with agnostic.

the Azure Cosmos


Because supported MongoDB wire protocol
versions will change over time, it’s best to
refer to our online documentation for
DB for MongoDB API information on wire protocol compatibility
and wire protocol support.
Like all supported APIs in Azure Cosmos DB,
the Azure Cosmos DB for MongoDB API is Quickstarts
based on a native wire protocol Our 5-minute quickstarts can help you get
implementation—that is, an implementation started with the Azure Cosmos DB MongoDB
that does not use any MongoDB source code. API in the time it takes to walk down the hall
This lets you easily migrate your MongoDB for a cup of coffee. They’re organized by
apps to Azure Cosmos DB while preserving programming language, so you shouldn’t
significant portions of your application logic, have trouble finding one that sparks your
enables you to keep your apps portable, and interest.

Language Scenario

.NET Create an the Azure Cosmos DB for MongoDB API account, a document database,
and a collection using the Azure portal, and then build and deploy a web app
based on the MongoDB .NET driver.

Java Create an Azure Cosmos DB for MongoDB API account, a document database, and
a collection using the Azure portal, and then build and deploy a web app based on
the MongoDB Java driver.

Node.js Connect an existing MongoDB app written in Node.js to an Azure Cosmos DB for
MongoDB API account. When you’re done, you will have a MEAN application
(MongoDB, Express, Angular, and Node.js) running on Azure Cosmos DB.

Python Build a simple Flask app by using the Azure Cosmos DB Emulator and the Azure
Cosmos DB for MongoDB API.

Xamarin Create an Azure Cosmos DB for MongoDB API account, a document database, and
a collection using the Azure portal, and then build a Xamarin.Forms app by using
the MongoDB .NET driver.

Golang Create an Azure Cosmos DB for MongoDB API account and then connect to it
using an existing MongoDB app written in Golang.

39
Creating an Azure Cosmos DB account
Tutorials and managing data
When you’re ready to dive deeper into the The following tutorials walk you through
Azure Cosmos DB for MongoDB API, the creating an Azure Cosmos DB account and
following tutorials are a great place to start.
The tutorials listed under Creating an Azure
Cosmos DB account assume you’re starting creating a collection, and then they show you
from scratch and include instructions for how to connect to your new database and
creating an account. All of the other tutorials start using it.
assume you already have an account.

Tutorial Scenario

Node.js console app Use a Node.js console app to connect to an Azure Cosmos DB for
MongoDB API database

Create a MongoDB app Create a MongoDB app with Express, Angular, and Node.js (the MEAN
with Angular stack), and then connect it to Azure Cosmos DB. You’ll create a Node.js
Express app with the Angular CLI; build the UI with Angular; create an
Azure Cosmos DB account using the Azure CLI; connect to Azure
Cosmos DB using Mongoose; and add Post, Put, and Delete functions.

Create a MongoDB app Similar to the above but uses React instead of Angular—and is a video
with React tutorial.

Migrating data
Azure regions. Set up global distribution using
After you’ve familiarized yourself with Azure
the Azure Cosmos DB for MongoDB API
Cosmos DB, you might want to try importing
shows you how to set up Azure Cosmos DB
some of your own data. To migrate data into
global distribution using the Azure portal,
or out of Azure Cosmos DB for MongoDB API
verify your regional setup, and then connect
collections, you need to use Mongoimport.exe
to a preferred region using the Azure Cosmos
or Mongorestore.exe.
DB for MongoDB API ,

Querying data
How-to guides
Querying data using the Azure Cosmos DB for
The Azure Cosmos DB documentation
MongoDB API provides several example
includes several how-to guides for common
queries that show you how to query your data
Azure Cosmos DB for MongoDB API-related
in Azure Cosmos DB using MongoDB shell.
tasks, such as getting the connection string,
Distributing data globally connecting using Studio 3T, distributing reads
globally, using time-to-live (TTL) functionality
After you have some data in your database,
to automatically expire data, and managing
you might want to distribute it to additional
data indexing. There are dozens of non-API-

40
specific guides in the Azure Cosmos DB There are also several videos on the Azure
documentation, too, with the first one starting Cosmos DB YouTube channel, which cover a
here and the rest listed below it in the left- broad range of topics that you may find
hand navigation pane. useful.

41
Gremlin, the following resources may be
Getting Started with helpful:

the Gremlin API


• Introduction to the Azure Cosmos DB
Gremlin API provides an overview of graph
databases and explains how you can use
Like all supported APIs in Azure Cosmos DB, them to store massive graphs with billions
the Gremlin API is based on a native wire of vertices and edges.
protocol implementation—that is, an
• Azure Cosmos DB Gremlin graph support
implementation that does not use any Gremlin
provides a basic introduction to Gremlin,
source code. This lets you easily migrate your
including examples, features, GraphSON
Gremlin apps to Azure Cosmos DB while
(the Gremlin wire format), and the Gremlin
preserving significant portions of your
steps supported by Azure Cosmos DB.
application logic, enables you to keep your
apps portable, and lets you continue to
remain cloud vendor-agnostic.
Quickstarts
Our 5-minute quickstarts can help you get
Azure Cosmos DB supports Apache
started with the Azure Cosmos DB for
TinkerPop's Gremlin graph traversal
MongoDB API in the time it takes to walk
language, which you can use to create graph
down the hall for a cup of coffee. They’re
entities (vertices and edges), modify
organized by programming language, so you
properties within those entities, perform
shouldn’t have trouble finding one that sparks
queries and traversals, and delete entities. If
your interest.
you’re not familiar with graph databases or

Language Scenario

Gremlin Create an Azure Cosmos DB Gremlin API account, database, and graph (container)
console using the Azure portal, and then use the Gremlin Console from Apache TinkerPop
to work with Gremlin API data

.NET Create an Azure Cosmos DB Gremlin API account, database, and graph (container)
using the Azure portal, and then build and run a console app built using the open-
source driver Gremlin.Net.

Java Create a simple graph database using the Azure portal, and then create a Java
console app using a Gremlin API database using the OSS Apache TinkerPop driver.

Node.js Create an Azure Cosmos DB Gremlin API account, database, and graph using the
Azure portal, and then use the open-source Gremlin Node.js driver to build and
run a console app.

Python Create an Azure Cosmos DB Gremlin API account by using the Azure portal, and
then use Python to build a console app by cloning an example from GitHub.

PHP Create an Azure Cosmos DB Gremlin API account by using the Azure portal, and
then use PHP to build a console app by cloning an example from GitHub.

42
How-to guides
Tutorials
The Azure Cosmos DB documentation includes
When you’re ready to dive deeper into the
several how-to guides for common Gremlin
Gremlin API, the following tutorials are a great
API-related tasks, such as using a partitioned
place to start:
graph in Azure Cosmos DB, optimizing Gremlin
queries, and Azure PowerShell samples for the
Migrating data
Azure Cosmos DB Gremlin API. There are
You can import data programmatically using dozens of non-API-specific guides in the Azure
the bulk executor library for the Gremlin API on Cosmos DB documentation, too, with the first
GitHub. Our how-to guide for using the bulk one starting here and the rest listed below it in
executor library with the Gremlin API provides the left-hand navigation pane.
instructions for using it to import and update
There are also several videos on the Azure
graph objects into an Azure Cosmos DB
Gremlin API container. Cosmos DB YouTube channel, which cover a
broad range of topics that you may find useful.
Querying data
Querying data using the Gremlin API provides
sample documents and queries to get you
started with Gremlin queries.

43
introduction to the Table API takes a closer
Getting Started with look at the benefits of moving from Azure
Table storage to the Azure Cosmos DB Table
the Table API API.

The Azure Cosmos DB Table API supports Quickstarts


applications that are written for Azure Table Our 5-minute quickstarts can help you get
storage, augmenting them with premium started with the Azure Cosmos DB for
capabilities such as turnkey global MongoDB API in the time it takes to walk
distribution, dedicated throughput worldwide, down the hall for a cup of coffee. They’re
single-digit millisecond latencies at the 99th organized by programming language, so you
percentile, guaranteed high availability, and shouldn’t have trouble finding one that sparks
automatic secondary indexing. Our your interest.

Language Scenario

.NET Use the Azure portal to create an Azure Cosmos DB Table API account, use Data
Explorer to create tables and entities, and then build a .NET app that connects to
the Table API by cloning an example from GitHub.

Java Use the Azure portal to create an Azure Cosmos DB Table API account, use Data
Explorer to create tables and entities, and then build a Java app that connects to
the Table API by cloning an example from GitHub.

Node.js Use the Azure portal to create an Azure Cosmos DB Table API account, use Data
Explorer to create tables and entities, and then build a Node.js app that connects
to the Table API by cloning an example from GitHub.

Python Use the Azure portal to create an Azure Cosmos DB Table API account, use Data
Explorer to create tables and entities, and then build a Python app that connects to
the Table API by cloning an example from GitHub.

Tutorials Step 1: Creating an Azure Cosmos DB


account and managing data
When you’re ready to dive deeper into the
Table API, the following tutorials are a great Get started with Azure Cosmos DB Table API
place to start. The tutorials listed under Step 1: and Azure Table storage walks you through
Creating an Azure Cosmos DB account assume creating an Azure Cosmos DB account and
you’re starting from scratch and include creating a table, and then shows you how to
instructions for creating an account. All of the connect to your new database and start using
other tutorials assume you already have an it. You learn how to enable functionality in the
account. app.config file, create a table using the Table
API, add an entity to a table, insert a batch of
entities, retrieve a single entity, query entities

44
using automatic secondary indexes, replace an Azure regions. Set up global distribution using
entity, delete an entity, and delete a table. the Table API walks you through replicating
data to additional Azure regions by using the
Step 2: Importing data Azure portal, and then connecting to a
Migrate your data to an Azure Cosmos DB preferred region by using the Table API.
Table API account walks you through
importing data for use with the Table API. If How-to guides
your data is in Azure Table storage, you can The Azure Cosmos DB documentation
use either the Data Migration Tool or AzCopy includes several how-to guides for common
to import it. If your data is in an Azure Cosmos SQL API-related tasks, such as building apps
DB Table API (preview) account, you’ll need to with the Table API, guidance on Table storage
use the Data Migration tool to import it. Both design, and Azure PowerShell samples for the
methods are addressed in the tutorial. Azure Cosmos DB Table API. There are dozens
of non-API-specific guides in the Azure
Step 3: Querying data Cosmos DB documentation, too, with the first
The Azure Cosmos DB Table API supports one starting here and the rest listed below it
OData and LINQ queries against key/value in the left-hand navigation pane.
(table) data. Both methods are covered in our
There are also several videos on the Azure
tutorial on querying Azure Cosmos DB by
Cosmos DB YouTube channel, which cover a
using the Table API.
broad range of topics that you may find
useful.
Step 4: Distributing data globally
After you have some data in your database,
you might want to distribute it to additional

45
Conclusion
If you’re looking for a NoSQL database, you Sign up for a free Azure
owe it to yourself to consider Azure Cosmos account and get started with
DB. No other multi-model distributed Azure Cosmos DB today – or
database offers turnkey global distribution; try Azure Cosmos DB for free
unlimited elastic scalability of storage and without an Azure
throughput; guaranteed single-digit- subscription.
millisecond latency; five well-defined
consistency levels; and comprehensive SLAs
for availability, latency, throughput, and
consistency. Regardless of what your next app
is built to do, if it needs to do it with low
latency at a global scale, Azure Cosmos DB
can help you get there.

46

You might also like