0% found this document useful (0 votes)
84 views117 pages

Cloud Application Architecture Fundamentals 1651479395

The document outlines the fundamentals of Azure application architecture, emphasizing the shift from monolithic to decentralized services for scalability, security, and resilience. It details various architecture styles, technology choices, and design principles essential for building cloud applications. Additionally, it discusses the importance of managing dependencies and the challenges associated with different architectural approaches, including big compute and big data solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views117 pages

Cloud Application Architecture Fundamentals 1651479395

The document outlines the fundamentals of Azure application architecture, emphasizing the shift from monolithic to decentralized services for scalability, security, and resilience. It details various architecture styles, technology choices, and design principles essential for building cloud applications. Additionally, it discusses the importance of managing dependencies and the challenges associated with different architectural approaches, including big compute and big data solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 117

Azure application architecture fundamentals

3/10/2022 • 3 minutes to read • Edit Online

This library of content presents a structured approach for designing applications on Azure that are scalable,
secure, resilient, and highly available. The guidance is based on proven practices that we have learned from
customer engagements.

Introduction
The cloud is changing how applications are designed and secured. Instead of monoliths, applications are
decomposed into smaller, decentralized services. These services communicate through APIs or by using
asynchronous messaging or eventing. Applications scale horizontally, adding new instances as demand requires.
These trends bring new challenges. Application states are distributed. Operations are done in parallel and
asynchronously. Applications must be resilient when failures occur. Malicious actors continuously target
applications. Deployments must be automated and predictable. Monitoring and telemetry are critical for gaining
insight into the system. This guide is designed to help you navigate these changes.

T RA DIT IO N A L O N - P REM ISES M O DERN C LO UD

Monolithic Decomposed
Designed for predictable scalability Designed for elastic scale
Relational database Polyglot persistence (mix of storage technologies)
Synchronized processing Asynchronous processing
Design to avoid failures (MTBF) Design for failure (MTTR)
Occasional large updates Frequent small updates
Manual management Automated self-management
Snowflake servers Immutable infrastructure

How this guidance is structured


The Azure application architecture fundamentals guidance is organized as a series of steps, from the architecture
and design to implementation. For each step, there is supporting guidance that will help you with the design of
your application architecture.
Architecture styles
The first decision point is the most fundamental. What kind of architecture are you building? It might be a
microservices architecture, a more traditional N-tier application, or a big data solution. We have identified
several distinct architecture styles. There are benefits and challenges to each.
Learn more: Architecture styles

Technology choices
Knowing the type of architecture you are building, now you can start to choose the main technology pieces for
the architecture. The following technology choices are critical:
Compute refers to the hosting model for the computing resources that your applications run on. For
more information, see Choose a compute service.
Data stores include databases but also storage for message queues, caches, logs, and anything else that
an application might persist to storage. For more information, see Choose a data store.
Messaging technologies enable asynchronous messages between components of the system. For more
information, see Choose a messaging service.
You will probably have to make additional technology choices along the way, but these three elements
(compute, data, and messaging) are central to most cloud applications and will determine many aspects of your
design.

Design the architecture


Once you have chosen the architecture style and the major technology components, you are ready to tackle the
specific design of your application. Every application is different, but the following resources can help you along
the way:
Reference architectures
Depending on your scenario, one of our reference architectures may be a good starting point. Each reference
architecture includes recommended practices, along with considerations for scalability, availability, security,
resilience, and other aspects of the design. Most also include a deployable solution or reference implementation.
Design principles
We have identified 10 high-level design principles that will make your application more scalable, resilient, and
manageable. These design principles apply to any architecture style. Throughout the design process, keep these
10 high-level design principles in mind. For more information, see Design principles.
Design patterns
Software design patterns are repeatable patterns that are proven to solve specific problems. Our catalog of
Cloud design patterns addresses specific challenges in distributed systems. They address aspects such as
availability, high availability, operational excellence, resiliency, performance, and security. You can find our
catalog of design patterns here.
Best practices
Our best practices articles cover various design considerations including API design, autoscaling, data
partitioning, caching, and so forth. Review these and apply the best practices that are appropriate for your
application.
Security best practices
Our security best practices describe how to ensure that the confidentiality, integrity, and availability of your
application aren't compromised by malicious actors.
Quality pillars
A successful cloud application will focus on five pillars of software quality: Cost optimization, Operational
excellence, Performance efficiency, Reliability, and Security.
Leverage the Microsoft Azure Well-Architected Framework to assess your architecture across these five pillars.

Next steps
Architecture styles
Architecture styles
3/10/2022 • 5 minutes to read • Edit Online

An architecture style is a family of architectures that share certain characteristics. For example, N-tier is a
common architecture style. More recently, microservice architectures have started to gain favor. Architecture
styles don't require the use of particular technologies, but some technologies are well-suited for certain
architectures. For example, containers are a natural fit for microservices.
We have identified a set of architecture styles that are commonly found in cloud applications. The article for
each style includes:
A description and logical diagram of the style.
Recommendations for when to choose this style.
Benefits, challenges, and best practices.
A recommended deployment using relevant Azure services.

A quick tour of the styles


This section gives a quick tour of the architecture styles that we've identified, along with some high-level
considerations for their use. Read more details in the linked topics.
N -tier
N-tier is a traditional architecture for enterprise applications. Dependencies are managed by dividing the
application into layers that perform logical functions, such as presentation, business logic, and data access. A
layer can only call into layers that sit below it. However, this horizontal layering can be a liability. It can be hard to
introduce changes in one part of the application without touching the rest of the application. That makes
frequent updates a challenge, limiting how quickly new features can be added.
N-tier is a natural fit for migrating existing applications that already use a layered architecture. For that reason,
N-tier is most often seen in infrastructure as a service (IaaS) solutions, or application that use a mix of IaaS and
managed services.
Web-Queue -Worker
For a purely PaaS solution, consider a Web-Queue-Worker architecture. In this style, the application has a web
front end that handles HTTP requests and a back-end worker that performs CPU-intensive tasks or long-running
operations. The front end communicates to the worker through an asynchronous message queue.
Web-queue-worker is suitable for relatively simple domains with some resource-intensive tasks. Like N-tier, the
architecture is easy to understand. The use of managed services simplifies deployment and operations. But with
complex domains, it can be hard to manage dependencies. The front end and the worker can easily become
large, monolithic components that are hard to maintain and update. As with N-tier, this can reduce the frequency
of updates and limit innovation.
Microservices
If your application has a more complex domain, consider moving to a Microser vices architecture. A
microservices application is composed of many small, independent services. Each service implements a single
business capability. Services are loosely coupled, communicating through API contracts.
Each service can be built by a small, focused development team. Individual services can be deployed without a
lot of coordination between teams, which encourages frequent updates. A microservice architecture is more
complex to build and manage than either N-tier or web-queue-worker. It requires a mature development and
DevOps culture. But done right, this style can lead to higher release velocity, faster innovation, and a more
resilient architecture.
Event-driven architecture
Event-Driven Architectures use a publish-subscribe (pub-sub) model, where producers publish events, and
consumers subscribe to them. The producers are independent from the consumers, and consumers are
independent from each other.
Consider an event-driven architecture for applications that ingest and process a large volume of data with very
low latency, such as IoT solutions. The style is also useful when different subsystems must perform different
types of processing on the same event data.
Big Data, Big Compute
Big Data and Big Compute are specialized architecture styles for workloads that fit certain specific profiles.
Big data divides a very large dataset into chunks, performing parallel processing across the entire set, for
analysis and reporting. Big compute, also called high-performance computing (HPC), makes parallel
computations across a large number (thousands) of cores. Domains include simulations, modeling, and 3-D
rendering.

Architecture styles as constraints


An architecture style places constraints on the design, including the set of elements that can appear and the
allowed relationships between those elements. Constraints guide the "shape" of an architecture by restricting the
universe of choices. When an architecture conforms to the constraints of a particular style, certain desirable
properties emerge.
For example, the constraints in microservices include:
A service represents a single responsibility.
Every service is independent of the others.
Data is private to the service that owns it. Services do not share data.
By adhering to these constraints, what emerges is a system where services can be deployed independently,
faults are isolated, frequent updates are possible, and it's easy to introduce new technologies into the
application.
Before choosing an architecture style, make sure that you understand the underlying principles and constraints
of that style. Otherwise, you can end up with a design that conforms to the style at a superficial level, but does
not achieve the full potential of that style. It's also important to be pragmatic. Sometimes it's better to relax a
constraint, rather than insist on architectural purity.
The following table summarizes how each style manages dependencies, and the types of domain that are best
suited for each.

A RC H IT EC T URE ST Y L E DEP EN DEN C Y M A N A GEM EN T DO M A IN T Y P E

N-tier Horizontal tiers divided by subnet Traditional business domain. Frequency


of updates is low.

Web-Queue-Worker Front and backend jobs, decoupled by Relatively simple domain with some
async messaging. resource intensive tasks.

Microservices Vertically (functionally) decomposed Complicated domain. Frequent


services that call each other through updates.
APIs.
A RC H IT EC T URE ST Y L E DEP EN DEN C Y M A N A GEM EN T DO M A IN T Y P E

Event-driven architecture. Producer/consumer. Independent view IoT and real-time systems


per sub-system.

Big data Divide a huge dataset into small Batch and real-time data analysis.
chunks. Parallel processing on local Predictive analysis using ML.
datasets.

Big compute Data allocation to thousands of cores. Compute intensive domains such as
simulation.

Consider challenges and benefits


Constraints also create challenges, so it's important to understand the trade-offs when adopting any of these
styles. Do the benefits of the architecture style outweigh the challenges, for this subdomain and bounded
context.
Here are some of the types of challenges to consider when selecting an architecture style:
Complexity . Is the complexity of the architecture justified for your domain? Conversely, is the style too
simplistic for your domain? In that case, you risk ending up with a "big ball of mud", because the
architecture does not help you to manage dependencies cleanly.
Asynchronous messaging and eventual consistency . Asynchronous messaging can be used to
decouple services, and increase reliability (because messages can be retried) and scalability. However, this
also creates challenges in handling eventual consistency, as well as the possibility of duplicate messages.
Inter-ser vice communication . As you decompose an application into separate services, there is a risk
that communication between services will cause unacceptable latency or create network congestion (for
example, in a microservices architecture).
Manageability . How hard is it to manage the application, monitor, deploy updates, and so on?
Big compute architecture style
3/10/2022 • 3 minutes to read • Edit Online

The term big compute describes large-scale workloads that require a large number of cores, often numbering in
the hundreds or thousands. Scenarios include image rendering, fluid dynamics, financial risk modeling, oil
exploration, drug design, and engineering stress analysis, among others.

Here are some typical characteristics of big compute applications:


The work can be split into discrete tasks, which can be run across many cores simultaneously.
Each task is finite. It takes some input, does some processing, and produces output. The entire application
runs for a finite amount of time (minutes to days). A common pattern is to provision a large number of cores
in a burst, and then spin down to zero once the application completes.
The application does not need to stay up 24/7. However, the system must handle node failures or application
crashes.
For some applications, tasks are independent and can run in parallel. In other cases, tasks are tightly coupled,
meaning they must interact or exchange intermediate results. In that case, consider using high-speed
networking technologies such as InfiniBand and remote direct memory access (RDMA).
Depending on your workload, you might use compute-intensive VM sizes (H16r, H16mr, and A9).

When to use this architecture


Computationally intensive operations such as simulation and number crunching.
Simulations that are computationally intensive and must be split across CPUs in multiple computers (10-
1000s).
Simulations that require too much memory for one computer, and must be split across multiple computers.
Long-running computations that would take too long to complete on a single computer.
Smaller computations that must be run 100s or 1000s of times, such as Monte Carlo simulations.

Benefits
High performance with "embarrassingly parallel" processing.
Can harness hundreds or thousands of computer cores to solve large problems faster.
Access to specialized high-performance hardware, with dedicated high-speed InfiniBand networks.
You can provision VMs as needed to do work, and then tear them down.

Challenges
Managing the VM infrastructure.
Managing the volume of number crunching
Provisioning thousands of cores in a timely manner.
For tightly coupled tasks, adding more cores can have diminishing returns. You may need to experiment to
find the optimum number of cores.

Big compute using Azure Batch


Azure Batch is a managed service for running large-scale high-performance computing (HPC) applications.
Using Azure Batch, you configure a VM pool, and upload the applications and data files. Then the Batch service
provisions the VMs, assign tasks to the VMs, runs the tasks, and monitors the progress. Batch can automatically
scale out the VMs in response to the workload. Batch also provides job scheduling.

Big compute running on Virtual Machines


You can use Microsoft HPC Pack to administer a cluster of VMs, and schedule and monitor HPC jobs. With this
approach, you must provision and manage the VMs and network infrastructure. Consider this approach if you
have existing HPC workloads and want to move some or all it to Azure. You can move the entire HPC cluster to
Azure, or you can keep your HPC cluster on-premises but use Azure for burst capacity. For more information,
see Batch and HPC solutions for large-scale computing workloads.
HPC Pack deployed to Azure
In this scenario, the HPC cluster is created entirely within Azure.
The head node provides management and job scheduling services to the cluster. For tightly coupled tasks, use
an RDMA network that provides very high bandwidth, low latency communication between VMs. For more
information, see Deploy an HPC Pack 2016 cluster in Azure.
Burst an HPC cluster to Azure
In this scenario, an organization is running HPC Pack on-premises, and uses Azure VMs for burst capacity. The
cluster head node is on-premises. ExpressRoute or VPN Gateway connects the on-premises network to the
Azure VNet.

Next steps
Choose an Azure compute service for your application
High Performance Computing (HPC) on Azure
HPC cluster deployed in the cloud
Big data architecture style
3/10/2022 • 10 minutes to read • Edit Online

A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or
complex for traditional database systems.

Big data solutions typically involve one or more of the following types of workload:
Batch processing of big data sources at rest.
Real-time processing of big data in motion.
Interactive exploration of big data.
Predictive analytics and machine learning.
Most big data architectures include some or all of the following components:
Data sources : All big data solutions start with one or more data sources. Examples include:
Application data stores, such as relational databases.
Static files produced by applications, such as web server log files.
Real-time data sources, such as IoT devices.
Data storage : Data for batch processing operations is typically stored in a distributed file store that can
hold high volumes of large files in various formats. This kind of store is often called a data lake. Options
for implementing this storage include Azure Data Lake Store or blob containers in Azure Storage.
Batch processing : Because the data sets are so large, often a big data solution must process data files
using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually
these jobs involve reading source files, processing them, and writing the output to new files. Options
include running U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in
an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight Spark cluster.
Real-time message ingestion : If the solution includes real-time sources, the architecture must include
a way to capture and store real-time messages for stream processing. This might be a simple data store,
where incoming messages are dropped into a folder for processing. However, many solutions need a
message ingestion store to act as a buffer for messages, and to support scale-out processing, reliable
delivery, and other message queuing semantics. Options include Azure Event Hubs, Azure IoT Hubs, and
Kafka.
Stream processing : After capturing real-time messages, the solution must process them by filtering,
aggregating, and otherwise preparing the data for analysis. The processed stream data is then written to
an output sink. Azure Stream Analytics provides a managed stream processing service based on
perpetually running SQL queries that operate on unbounded streams. You can also use open source
Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster.
Analytical data store : Many big data solutions prepare data for analysis and then serve the processed
data in a structured format that can be queried using analytical tools. The analytical data store used to
serve these queries can be a Kimball-style relational data warehouse, as seen in most traditional business
intelligence (BI) solutions. Alternatively, the data could be presented through a low-latency NoSQL
technology such as HBase, or an interactive Hive database that provides a metadata abstraction over data
files in the distributed data store. Azure Synapse Analytics provides a managed service for large-scale,
cloud-based data warehousing. HDInsight supports Interactive Hive, HBase, and Spark SQL, which can
also be used to serve data for analysis.
Analysis and repor ting : The goal of most big data solutions is to provide insights into the data through
analysis and reporting. To empower users to analyze the data, the architecture may include a data
modeling layer, such as a multidimensional OLAP cube or tabular data model in Azure Analysis Services.
It might also support self-service BI, using the modeling and visualization technologies in Microsoft
Power BI or Microsoft Excel. Analysis and reporting can also take the form of interactive data exploration
by data scientists or data analysts. For these scenarios, many Azure services support analytical
notebooks, such as Jupyter, enabling these users to leverage their existing skills with Python or R. For
large-scale data exploration, you can use Microsoft R Server, either standalone or with Spark.
Orchestration : Most big data solutions consist of repeated data processing operations, encapsulated in
workflows, that transform source data, move data between multiple sources and sinks, load the
processed data into an analytical data store, or push the results straight to a report or dashboard. To
automate these workflows, you can use an orchestration technology such Azure Data Factory or Apache
Oozie and Sqoop.
Azure includes many services that can be used in a big data architecture. They fall roughly into two categories:
Managed services, including Azure Data Lake Store, Azure Data Lake Analytics, Azure Synapse Analytics,
Azure Stream Analytics, Azure Event Hub, Azure IoT Hub, and Azure Data Factory.
Open source technologies based on the Apache Hadoop platform, including HDFS, HBase, Hive, Pig, Spark,
Storm, Oozie, Sqoop, and Kafka. These technologies are available on Azure in the Azure HDInsight service.
These options are not mutually exclusive, and many solutions combine open source technologies with Azure
services.

When to use this architecture


Consider this architecture style when you need to:
Store and process data in volumes too large for a traditional database.
Transform unstructured data for analysis and reporting.
Capture, process, and analyze unbounded streams of data in real time, or with low latency.
Use Azure Machine Learning or Microsoft Cognitive Services.

Benefits
Technology choices . You can mix and match Azure managed services and Apache technologies in
HDInsight clusters, to capitalize on existing skills or technology investments.
Performance through parallelism . Big data solutions take advantage of parallelism, enabling high-
performance solutions that scale to large volumes of data.
Elastic scale . All of the components in the big data architecture support scale-out provisioning, so that you
can adjust your solution to small or large workloads, and pay only for the resources that you use.
Interoperability with existing solutions . The components of the big data architecture are also used for
IoT processing and enterprise BI solutions, enabling you to create an integrated solution across data
workloads.

Challenges
Complexity . Big data solutions can be extremely complex, with numerous components to handle data
ingestion from multiple data sources. It can be challenging to build, test, and troubleshoot big data processes.
Moreover, there may be a large number of configuration settings across multiple systems that must be used
in order to optimize performance.
Skillset . Many big data technologies are highly specialized, and use frameworks and languages that are not
typical of more general application architectures. On the other hand, big data technologies are evolving new
APIs that build on more established languages. For example, the U-SQL language in Azure Data Lake
Analytics is based on a combination of Transact-SQL and C#. Similarly, SQL-based APIs are available for Hive,
HBase, and Spark.
Technology maturity . Many of the technologies used in big data are evolving. While core Hadoop
technologies such as Hive and Pig have stabilized, emerging technologies such as Spark introduce extensive
changes and enhancements with each new release. Managed services such as Azure Data Lake Analytics and
Azure Data Factory are relatively young, compared with other Azure services, and will likely evolve over time.
Security . Big data solutions usually rely on storing all static data in a centralized data lake. Securing access
to this data can be challenging, especially when the data must be ingested and consumed by multiple
applications and platforms.

Best practices
Leverage parallelism . Most big data processing technologies distribute the workload across multiple
processing units. This requires that static data files are created and stored in a splittable format.
Distributed file systems such as HDFS can optimize read and write performance, and the actual
processing is performed by multiple cluster nodes in parallel, which reduces overall job times.
Par tition data . Batch processing usually happens on a recurring schedule — for example, weekly or
monthly. Partition data files, and data structures such as tables, based on temporal periods that match the
processing schedule. That simplifies data ingestion and job scheduling, and makes it easier to
troubleshoot failures. Also, partitioning tables that are used in Hive, U-SQL, or SQL queries can
significantly improve query performance.
Apply schema-on-read semantics . Using a data lake lets you to combine storage for files in multiple
formats, whether structured, semi-structured, or unstructured. Use schema-on-read semantics, which
project a schema onto the data when the data is processing, not when the data is stored. This builds
flexibility into the solution, and prevents bottlenecks during data ingestion caused by data validation and
type checking.
Process data in-place . Traditional BI solutions often use an extract, transform, and load (ETL) process to
move data into a data warehouse. With larger volumes data, and a greater variety of formats, big data
solutions generally use variations of ETL, such as transform, extract, and load (TEL). With this approach,
the data is processed within the distributed data store, transforming it to the required structure, before
moving the transformed data into an analytical data store.
Balance utilization and time costs . For batch processing jobs, it's important to consider two factors:
The per-unit cost of the compute nodes, and the per-minute cost of using those nodes to complete the
job. For example, a batch job may take eight hours with four cluster nodes. However, it might turn out
that the job uses all four nodes only during the first two hours, and after that, only two nodes are
required. In that case, running the entire job on two nodes would increase the total job time, but would
not double it, so the total cost would be less. In some business scenarios, a longer processing time may
be preferable to the higher cost of using underutilized cluster resources.
Separate cluster resources . When deploying HDInsight clusters, you will normally achieve better
performance by provisioning separate cluster resources for each type of workload. For example, although
Spark clusters include Hive, if you need to perform extensive processing with both Hive and Spark, you
should consider deploying separate dedicated Spark and Hadoop clusters. Similarly, if you are using
HBase and Storm for low latency stream processing and Hive for batch processing, consider separate
clusters for Storm, HBase, and Hadoop.
Orchestrate data ingestion . In some cases, existing business applications may write data files for batch
processing directly into Azure storage blob containers, where they can be consumed by HDInsight or
Azure Data Lake Analytics. However, you will often need to orchestrate the ingestion of data from on-
premises or external data sources into the data lake. Use an orchestration workflow or pipeline, such as
those supported by Azure Data Factory or Oozie, to achieve this in a predictable and centrally
manageable fashion.
Scrub sensitive data early . The data ingestion workflow should scrub sensitive data early in the
process, to avoid storing it in the data lake.

IoT architecture
Internet of Things (IoT) is a specialized subset of big data solutions. The following diagram shows a possible
logical architecture for IoT. The diagram emphasizes the event-streaming components of the architecture.

The cloud gateway ingests device events at the cloud boundary, using a reliable, low latency messaging
system.
Devices might send events directly to the cloud gateway, or through a field gateway . A field gateway is a
specialized device or software, usually colocated with the devices, that receives events and forwards them to the
cloud gateway. The field gateway might also preprocess the raw device events, performing functions such as
filtering, aggregation, or protocol transformation.
After ingestion, events go through one or more stream processors that can route the data (for example, to
storage) or perform analytics and other processing.
The following are some common types of processing. (This list is certainly not exhaustive.)
Writing event data to cold storage, for archiving or batch analytics.
Hot path analytics, analyzing the event stream in (near) real time, to detect anomalies, recognize patterns
over rolling time windows, or trigger alerts when a specific condition occurs in the stream.
Handling special types of non-telemetry messages from devices, such as notifications and alarms.
Machine learning.
The boxes that are shaded gray show components of an IoT system that are not directly related to event
streaming, but are included here for completeness.
The device registr y is a database of the provisioned devices, including the device IDs and usually device
metadata, such as location.
The provisioning API is a common external interface for provisioning and registering new devices.
Some IoT solutions allow command and control messages to be sent to devices.

This section has presented a very high-level view of IoT, and there are many subtleties and challenges to
consider. For a more detailed reference architecture and discussion, see the Microsoft Azure IoT Reference
Architecture (PDF download).

Next steps
Learn more about big data architectures.
Learn more about IoT solutions.
Event-driven architecture style
3/10/2022 • 3 minutes to read • Edit Online

An event-driven architecture consists of event producers that generate a stream of events, and event
consumers that listen for the events.

Events are delivered in near real time, so consumers can respond immediately to events as they occur. Producers
are decoupled from consumers — a producer doesn't know which consumers are listening. Consumers are also
decoupled from each other, and every consumer sees all of the events. This differs from a Competing
Consumers pattern, where consumers pull messages from a queue and a message is processed just once
(assuming no errors). In some systems, such as IoT, events must be ingested at very high volumes.
An event driven architecture can use a pub/sub model or an event stream model.
Pub/sub : The messaging infrastructure keeps track of subscriptions. When an event is published, it sends
the event to each subscriber. After an event is received, it cannot be replayed, and new subscribers do not
see the event.
Event streaming : Events are written to a log. Events are strictly ordered (within a partition) and durable.
Clients don't subscribe to the stream, instead a client can read from any part of the stream. The client is
responsible for advancing its position in the stream. That means a client can join at any time, and can
replay events.
On the consumer side, there are some common variations:
Simple event processing . An event immediately triggers an action in the consumer. For example, you
could use Azure Functions with a Service Bus trigger, so that a function executes whenever a message is
published to a Service Bus topic.
Complex event processing . A consumer processes a series of events, looking for patterns in the event
data, using a technology such as Azure Stream Analytics or Apache Storm. For example, you could
aggregate readings from an embedded device over a time window, and generate a notification if the
moving average crosses a certain threshold.
Event stream processing . Use a data streaming platform, such as Azure IoT Hub or Apache Kafka, as a
pipeline to ingest events and feed them to stream processors. The stream processors act to process or
transform the stream. There may be multiple stream processors for different subsystems of the
application. This approach is a good fit for IoT workloads.
The source of the events may be external to the system, such as physical devices in an IoT solution. In that case,
the system must be able to ingest the data at the volume and throughput that is required by the data source.
In the logical diagram above, each type of consumer is shown as a single box. In practice, it's common to have
multiple instances of a consumer, to avoid having the consumer become a single point of failure in system.
Multiple instances might also be necessary to handle the volume and frequency of events. Also, a single
consumer might process events on multiple threads. This can create challenges if events must be processed in
order or require exactly-once semantics. See Minimize Coordination.

When to use this architecture


Multiple subsystems must process the same events.
Real-time processing with minimum time lag.
Complex event processing, such as pattern matching or aggregation over time windows.
High volume and high velocity of data, such as IoT.

Benefits
Producers and consumers are decoupled.
No point-to-point integrations. It's easy to add new consumers to the system.
Consumers can respond to events immediately as they arrive.
Highly scalable and distributed.
Subsystems have independent views of the event stream.

Challenges
Guaranteed delivery. In some systems, especially in IoT scenarios, it's crucial to guarantee that events are
delivered.
Processing events in order or exactly once. Each consumer type typically runs in multiple instances, for
resiliency and scalability. This can create a challenge if the events must be processed in order (within a
consumer type), or if the processing logic is not idempotent.
Additional considerations
The amount of data to include in an event can be a significant consideration that affects both performance
and cost. Putting all the relevant information needed for processing in the event itself can simplify the
processing code and save additional lookups. Putting the minimal amount of information in an event, like
just a couple of identifiers, will reduce transport time and cost, but requires the processing code to look up
any additional information it needs. For more information on this, take a look at this blog post.
Microservices architecture style
3/10/2022 • 6 minutes to read • Edit Online

A microservices architecture consists of a collection of small, autonomous services. Each service is self-
contained and should implement a single business capability within a bounded context. A bounded context is a
natural division within a business and provides an explicit boundary within which a domain model exists.

What are microservices?


Microservices are small, independent, and loosely coupled. A single small team of developers can write
and maintain a service.
Each service is a separate codebase, which can be managed by a small development team.
Services can be deployed independently. A team can update an existing service without rebuilding and
redeploying the entire application.
Services are responsible for persisting their own data or external state. This differs from the traditional
model, where a separate data layer handles data persistence.
Services communicate with each other by using well-defined APIs. Internal implementation details of
each service are hidden from other services.
Supports polyglot programming. For example, services don't need to share the same technology stack,
libraries, or frameworks.
Besides for the services themselves, some other components appear in a typical microservices architecture:
Management/orchestration . This component is responsible for placing services on nodes, identifying
failures, rebalancing services across nodes, and so forth. Typically this component is an off-the-shelf technology
such as Kubernetes, rather than something custom built.
API Gateway . The API gateway is the entry point for clients. Instead of calling services directly, clients call the
API gateway, which forwards the call to the appropriate services on the back end.
Advantages of using an API gateway include:
It decouples clients from services. Services can be versioned or refactored without needing to update all
of the clients.
Services can use messaging protocols that are not web friendly, such as AMQP.
The API Gateway can perform other cross-cutting functions such as authentication, logging, SSL
termination, and load balancing.
Out-of-the-box policies, like for throttling, caching, transformation, or validation.

Benefits
Agility. Because microservices are deployed independently, it's easier to manage bug fixes and feature
releases. You can update a service without redeploying the entire application, and roll back an update if
something goes wrong. In many traditional applications, if a bug is found in one part of the application, it
can block the entire release process. New features may be held up waiting for a bug fix to be integrated,
tested, and published.
Small, focused teams . A microservice should be small enough that a single feature team can build, test,
and deploy it. Small team sizes promote greater agility. Large teams tend be less productive, because
communication is slower, management overhead goes up, and agility diminishes.
Small code base . In a monolithic application, there is a tendency over time for code dependencies to
become tangled. Adding a new feature requires touching code in a lot of places. By not sharing code or
data stores, a microservices architecture minimizes dependencies, and that makes it easier to add new
features.
Mix of technologies . Teams can pick the technology that best fits their service, using a mix of
technology stacks as appropriate.
Fault isolation . If an individual microservice becomes unavailable, it won't disrupt the entire application,
as long as any upstream microservices are designed to handle faults correctly (for example, by
implementing circuit breaking).
Scalability . Services can be scaled independently, letting you scale out subsystems that require more
resources, without scaling out the entire application. Using an orchestrator such as Kubernetes or Service
Fabric, you can pack a higher density of services onto a single host, which allows for more efficient
utilization of resources.
Data isolation . It is much easier to perform schema updates, because only a single microservice is
affected. In a monolithic application, schema updates can become very challenging, because different
parts of the application may all touch the same data, making any alterations to the schema risky.

Challenges
The benefits of microservices don't come for free. Here are some of the challenges to consider before
embarking on a microservices architecture.
Complexity . A microservices application has more moving parts than the equivalent monolithic
application. Each service is simpler, but the entire system as a whole is more complex.
Development and testing . Writing a small service that relies on other dependent services requires a
different approach than a writing a traditional monolithic or layered application. Existing tools are not
always designed to work with service dependencies. Refactoring across service boundaries can be
difficult. It is also challenging to test service dependencies, especially when the application is evolving
quickly.
Lack of governance . The decentralized approach to building microservices has advantages, but it can
also lead to problems. You may end up with so many different languages and frameworks that the
application becomes hard to maintain. It may be useful to put some project-wide standards in place,
without overly restricting teams' flexibility. This especially applies to cross-cutting functionality such as
logging.
Network congestion and latency . The use of many small, granular services can result in more
interservice communication. Also, if the chain of service dependencies gets too long (service A calls B,
which calls C...), the additional latency can become a problem. You will need to design APIs carefully.
Avoid overly chatty APIs, think about serialization formats, and look for places to use asynchronous
communication patterns like queue-based load leveling.
Data integrity . With each microservice responsible for its own data persistence. As a result, data
consistency can be a challenge. Embrace eventual consistency where possible.
Management . To be successful with microservices requires a mature DevOps culture. Correlated logging
across services can be challenging. Typically, logging must correlate multiple service calls for a single
user operation.
Versioning . Updates to a service must not break services that depend on it. Multiple services could be
updated at any given time, so without careful design, you might have problems with backward or
forward compatibility.
Skill set . Microservices are highly distributed systems. Carefully evaluate whether the team has the skills
and experience to be successful.

Best practices
Model services around the business domain.
Decentralize everything. Individual teams are responsible for designing and building services. Avoid
sharing code or data schemas.
Data storage should be private to the service that owns the data. Use the best storage for each service
and data type.
Services communicate through well-designed APIs. Avoid leaking implementation details. APIs should
model the domain, not the internal implementation of the service.
Avoid coupling between services. Causes of coupling include shared database schemas and rigid
communication protocols.
Offload cross-cutting concerns, such as authentication and SSL termination, to the gateway.
Keep domain knowledge out of the gateway. The gateway should handle and route client requests
without any knowledge of the business rules or domain logic. Otherwise, the gateway becomes a
dependency and can cause coupling between services.
Services should have loose coupling and high functional cohesion. Functions that are likely to change
together should be packaged and deployed together. If they reside in separate services, those services
end up being tightly coupled, because a change in one service will require updating the other service.
Overly chatty communication between two services may be a symptom of tight coupling and low
cohesion.
Isolate failures. Use resiliency strategies to prevent failures within a service from cascading. See
Resiliency patterns and Designing reliable applications.

Next steps
For detailed guidance about building a microservices architecture on Azure, see Designing, building, and
operating microservices on Azure.
N-tier architecture style
3/10/2022 • 5 minutes to read • Edit Online

An N-tier architecture divides an application into logical layers and physical tiers .

Layers are a way to separate responsibilities and manage dependencies. Each layer has a specific responsibility.
A higher layer can use services in a lower layer, but not the other way around.
Tiers are physically separated, running on separate machines. A tier can call to another tier directly, or use
asynchronous messaging (message queue). Although each layer might be hosted in its own tier, that's not
required. Several layers might be hosted on the same tier. Physically separating the tiers improves scalability
and resiliency, but also adds latency from the additional network communication.
A traditional three-tier application has a presentation tier, a middle tier, and a database tier. The middle tier is
optional. More complex applications can have more than three tiers. The diagram above shows an application
with two middle tiers, encapsulating different areas of functionality.
An N-tier application can have a closed layer architecture or an open layer architecture :
In a closed layer architecture, a layer can only call the next layer immediately down.
In an open layer architecture, a layer can call any of the layers below it.
A closed layer architecture limits the dependencies between layers. However, it might create unnecessary
network traffic, if one layer simply passes requests along to the next layer.

When to use this architecture


N-tier architectures are typically implemented as infrastructure-as-service (IaaS) applications, with each tier
running on a separate set of VMs. However, an N-tier application doesn't need to be pure IaaS. Often, it's
advantageous to use managed services for some parts of the architecture, particularly caching, messaging, and
data storage.
Consider an N-tier architecture for:
Simple web applications.
Migrating an on-premises application to Azure with minimal refactoring.
Unified development of on-premises and cloud applications.
N-tier architectures are very common in traditional on-premises applications, so it's a natural fit for migrating
existing workloads to Azure.

Benefits
Portability between cloud and on-premises, and between cloud platforms.
Less learning curve for most developers.
Natural evolution from the traditional application model.
Open to heterogeneous environment (Windows/Linux)

Challenges
It's easy to end up with a middle tier that just does CRUD operations on the database, adding extra latency
without doing any useful work.
Monolithic design prevents independent deployment of features.
Managing an IaaS application is more work than an application that uses only managed services.
It can be difficult to manage network security in a large system.

Best practices
Use autoscaling to handle changes in load. See Autoscaling best practices.
Use asynchronous messaging to decouple tiers.
Cache semistatic data. See Caching best practices.
Configure the database tier for high availability, using a solution such as SQL Server Always On availability
groups.
Place a web application firewall (WAF) between the front end and the Internet.
Place each tier in its own subnet, and use subnets as a security boundary.
Restrict access to the data tier, by allowing requests only from the middle tier(s).

N-tier architecture on virtual machines


This section describes a recommended N-tier architecture running on VMs.

Each tier consists of two or more VMs, placed in an availability set or virtual machine scale set. Multiple VMs
provide resiliency in case one VM fails. Load balancers are used to distribute requests across the VMs in a tier. A
tier can be scaled horizontally by adding more VMs to the pool.
Each tier is also placed inside its own subnet, meaning their internal IP addresses fall within the same address
range. That makes it easy to apply network security group rules and route tables to individual tiers.
The web and business tiers are stateless. Any VM can handle any request for that tier. The data tier should
consist of a replicated database. For Windows, we recommend SQL Server, using Always On availability groups
for high availability. For Linux, choose a database that supports replication, such as Apache Cassandra.
Network security groups restrict access to each tier. For example, the database tier only allows access from the
business tier.

NOTE
The layer labeled "Business Tier" in our reference diagram is a moniker to the business logic tier. Likewise, we also call the
presentation tier the "Web Tier." In our example, this is a web application, though multi-tier architectures can be used for
other topologies as well (like desktop apps). Name your tiers what works best for your team to communicate the intent of
that logical and/or physical tier in your application - you could even express that naming in resources you choose to
represent that tier (e.g. vmss-appName-business-layer).

For more information about running N-tier applications on Azure:


Run Windows VMs for an N-tier application
Windows N-tier application on Azure with SQL Server
Microsoft Learn module: Tour the N-tier architecture style
Azure Bastion
Additional considerations
N-tier architectures are not restricted to three tiers. For more complex applications, it is common to have
more tiers. In that case, consider using layer-7 routing to route requests to a particular tier.
Tiers are the boundary of scalability, reliability, and security. Consider having separate tiers for services
with different requirements in those areas.
Use virtual machine scale sets for autoscaling.
Look for places in the architecture where you can use a managed service without significant refactoring.
In particular, look at caching, messaging, storage, and databases.
For higher security, place a network DMZ in front of the application. The DMZ includes network virtual
appliances (NVAs) that implement security functionality such as firewalls and packet inspection. For more
information, see Network DMZ reference architecture.
For high availability, place two or more NVAs in an availability set, with an external load balancer to
distribute Internet requests across the instances. For more information, see Deploy highly available
network virtual appliances.
Do not allow direct RDP or SSH access to VMs that are running application code. Instead, operators
should log into a jumpbox, also called a bastion host. This is a VM on the network that administrators use
to connect to the other VMs. The jumpbox has a network security group that allows RDP or SSH only
from approved public IP addresses.
You can extend the Azure virtual network to your on-premises network using a site-to-site virtual private
network (VPN) or Azure ExpressRoute. For more information, see Hybrid network reference architecture.
If your organization uses Active Directory to manage identity, you may want to extend your Active
Directory environment to the Azure VNet. For more information, see Identity management reference
architecture.
If you need higher availability than the Azure SLA for VMs provides, replicate the application across two
regions and use Azure Traffic Manager for failover. For more information, see Run Windows VMs in
multiple regions or Run Linux VMs in multiple regions.
Web-Queue-Worker architecture style
3/10/2022 • 3 minutes to read • Edit Online

The core components of this architecture are a web front end that serves client requests, and a worker that
performs resource-intensive tasks, long-running workflows, or batch jobs. The web front end communicates
with the worker through a message queue .

Other components that are commonly incorporated into this architecture include:
One or more databases.
A cache to store values from the database for quick reads.
CDN to serve static content
Remote services, such as email or SMS service. Often these are provided by third parties.
Identity provider for authentication.
The web and worker are both stateless. Session state can be stored in a distributed cache. Any long-running
work is done asynchronously by the worker. The worker can be triggered by messages on the queue, or run on a
schedule for batch processing. The worker is an optional component. If there are no long-running operations,
the worker can be omitted.
The front end might consist of a web API. On the client side, the web API can be consumed by a single-page
application that makes AJAX calls, or by a native client application.

When to use this architecture


The Web-Queue-Worker architecture is typically implemented using managed compute services, either Azure
App Service or Azure Cloud Services.
Consider this architecture style for:
Applications with a relatively simple domain.
Applications with some long-running workflows or batch operations.
When you want to use managed services, rather than infrastructure as a service (IaaS).

Benefits
Relatively simple architecture that is easy to understand.
Easy to deploy and manage.
Clear separation of concerns.
The front end is decoupled from the worker using asynchronous messaging.
The front end and the worker can be scaled independently.

Challenges
Without careful design, the front end and the worker can become large, monolithic components that are
difficult to maintain and update.
There may be hidden dependencies, if the front end and worker share data schemas or code modules.

Best practices
Expose a well-designed API to the client. See API design best practices.
Autoscale to handle changes in load. See Autoscaling best practices.
Cache semi-static data. See Caching best practices.
Use a CDN to host static content. See CDN best practices.
Use polyglot persistence when appropriate. See Use the best data store for the job.
Partition data to improve scalability, reduce contention, and optimize performance. See Data partitioning best
practices.

Web-Queue-Worker on Azure App Service


This section describes a recommended Web-Queue-Worker architecture that uses Azure App Service.

The front end is implemented as an Azure App Service web app, and the worker is implemented as an
Azure Functions app. The web app and the function app are both associated with an App Service plan that
provides the VM instances.
You can use either Azure Service Bus or Azure Storage queues for the message queue. (The diagram
shows an Azure Storage queue.)
Azure Cache for Redis stores session state and other data that needs low latency access.
Azure CDN is used to cache static content such as images, CSS, or HTML.
For storage, choose the storage technologies that best fit the needs of the application. You might use
multiple storage technologies (polyglot persistence). To illustrate this idea, the diagram shows Azure SQL
Database and Azure Cosmos DB.
For more details, see App Service web application reference architecture.
Additional considerations
Not every transaction has to go through the queue and worker to storage. The web front end can
perform simple read/write operations directly. Workers are designed for resource-intensive tasks or
long-running workflows. In some cases, you might not need a worker at all.
Use the built-in autoscale feature of App Service to scale out the number of VM instances. If the load on
the application follows predictable patterns, use schedule-based autoscale. If the load is unpredictable,
use metrics-based autoscaling rules.
Consider putting the web app and the function app into separate App Service plans. That way, they can be
scaled independently.
Use separate App Service plans for production and testing. Otherwise, if you use the same plan for
production and testing, it means your tests are running on your production VMs.
Use deployment slots to manage deployments. This lets you to deploy an updated version to a staging
slot, then swap over to the new version. It also lets you swap back to the previous version, if there was a
problem with the update.
Ten design principles for Azure applications
3/10/2022 • 2 minutes to read • Edit Online

Follow these design principles to make your application more scalable, resilient, and manageable.
Design for self healing . In a distributed system, failures happen. Design your application to be self healing
when failures occur.
Make all things redundant . Build redundancy into your application, to avoid having single points of failure.
Minimize coordination . Minimize coordination between application services to achieve scalability.
Design to scale out . Design your application so that it can scale horizontally, adding or removing new
instances as demand requires.
Par tition around limits . Use partitioning to work around database, network, and compute limits.
Design for operations . Design your application so that the operations team has the tools they need.
Use managed ser vices . When possible, use platform as a service (PaaS) rather than infrastructure as a service
(IaaS).
Use the best data store for the job . Pick the storage technology that is the best fit for your data and how it
will be used.
Design for evolution . All successful applications change over time. An evolutionary design is key for
continuous innovation.
Build for the needs of business . Every design decision must be justified by a business requirement.
Design for self healing
3/10/2022 • 4 minutes to read • Edit Online

Design your application to be self healing when failures occur


In a distributed system, failures can happen. Hardware can fail. The network can have transient failures. Rarely,
an entire service or region may experience a disruption, but even those must be planned for.
Therefore, design an application to be self healing when failures occur. This requires a three-pronged approach:
Detect failures.
Respond to failures gracefully.
Log and monitor failures, to give operational insight.
How you respond to a particular type of failure may depend on your application's availability requirements. For
example, if you require very high availability, you might automatically fail over to a secondary region during a
regional outage. However, that will incur a higher cost than a single-region deployment.
Also, don't just consider big events like regional outages, which are generally rare. You should focus as much, if
not more, on handling local, short-lived failures, such as network connectivity failures or failed database
connections.

Recommendations
Retr y failed operations . Transient failures may occur due to momentary loss of network connectivity, a
dropped database connection, or a timeout when a service is busy. Build retry logic into your application to
handle transient failures. For many Azure services, the client SDK implements automatic retries. For more
information, see Transient fault handling and the Retry pattern.
Protect failing remote ser vices (Circuit Breaker) . It's good to retry after a transient failure, but if the failure
persists, you can end up with too many callers hammering a failing service. This can lead to cascading failures,
as requests back up. Use the Circuit Breaker pattern to fail fast (without making the remote call) when an
operation is likely to fail.
Isolate critical resources (Bulkhead) . Failures in one subsystem can sometimes cascade. This can happen if a
failure causes some resources, such as threads or sockets, not to get freed in a timely manner, leading to
resource exhaustion. To avoid this, partition a system into isolated groups, so that a failure in one partition does
not bring down the entire system.
Perform load leveling . Applications may experience sudden spikes in traffic that can overwhelm services on
the backend. To avoid this, use the Queue-Based Load Leveling pattern to queue work items to run
asynchronously. The queue acts as a buffer that smooths out peaks in the load.
Fail over . If an instance can't be reached, fail over to another instance. For things that are stateless, like a web
server, put several instances behind a load balancer or traffic manager. For things that store state, like a
database, use replicas and fail over. Depending on the data store and how it replicates, this may require the
application to deal with eventual consistency.
Compensate failed transactions . In general, avoid distributed transactions, as they require coordination
across services and resources. Instead, compose an operation from smaller individual transactions. If the
operation fails midway through, use Compensating Transactions to undo any step that already completed.
Checkpoint long-running transactions . Checkpoints can provide resiliency if a long-running operation fails.
When the operation restarts (for example, it is picked up by another VM), it can be resumed from the last
checkpoint.
Degrade gracefully . Sometimes you can't work around a problem, but you can provide reduced functionality
that is still useful. Consider an application that shows a catalog of books. If the application can't retrieve the
thumbnail image for the cover, it might show a placeholder image. Entire subsystems might be noncritical for
the application. For example, in an e-commerce site, showing product recommendations is probably less critical
than processing orders.
Throttle clients . Sometimes a small number of users create excessive load, which can reduce your application's
availability for other users. In this situation, throttle the client for a certain period of time. See the Throttling
pattern.
Block bad actors . Just because you throttle a client, it doesn't mean client was acting maliciously. It just means
the client exceeded their service quota. But if a client consistently exceeds their quota or otherwise behaves
badly, you might block them. Define an out-of-band process for user to request getting unblocked.
Use leader election . When you need to coordinate a task, use Leader Election to select a coordinator. That way,
the coordinator is not a single point of failure. If the coordinator fails, a new one is selected. Rather than
implement a leader election algorithm from scratch, consider an off-the-shelf solution such as Zookeeper.
Test with fault injection . All too often, the success path is well tested but not the failure path. A system could
run in production for a long time before a failure path is exercised. Use fault injection to test the resiliency of the
system to failures, either by triggering actual failures or by simulating them.
Embrace chaos engineering . Chaos engineering extends the notion of fault injection, by randomly injecting
failures or abnormal conditions into production instances.
For a structured approach to making your applications self healing, see Design reliable applications for Azure.
Make all things redundant
3/10/2022 • 2 minutes to read • Edit Online

Build redundancy into your application, to avoid having single points


of failure
A resilient application routes around failure. Identify the critical paths in your application. Is there redundancy at
each point in the path? When a subsystem fails, will the application fail over to something else?

Recommendations
Consider business requirements . The amount of redundancy built into a system can affect both cost and
complexity. Your architecture should be informed by your business requirements, such as recovery time
objective (RTO). For example, a multi-region deployment is more expensive than a single-region deployment,
and is more complicated to manage. You will need operational procedures to handle failover and failback. The
additional cost and complexity might be justified for some business scenarios and not others.
Place VMs behind a load balancer . Don't use a single VM for mission-critical workloads. Instead, place
multiple VMs behind a load balancer. If any VM becomes unavailable, the load balancer distributes traffic to the
remaining healthy VMs. To learn how to deploy this configuration, see Multiple VMs for scalability and
availability.

Replicate databases . Azure SQL Database and Cosmos DB automatically replicate the data within a region,
and you can enable geo-replication across regions. If you are using an IaaS database solution, choose one that
supports replication and failover, such as SQL Server Always On availability groups.
Enable geo-replication . Geo-replication for Azure SQL Database and Cosmos DB creates secondary readable
replicas of your data in one or more secondary regions. In the event of an outage, the database can fail over to
the secondary region for writes.
Par tition for availability . Database partitioning is often used to improve scalability, but it can also improve
availability. If one shard goes down, the other shards can still be reached. A failure in one shard will only disrupt
a subset of the total transactions.
Deploy to more than one region . For the highest availability, deploy the application to more than one
region. That way, in the rare case when a problem affects an entire region, the application can fail over to
another region. The following diagram shows a multi-region application that uses Azure Traffic Manager to
handle failover.
Synchronize front and backend failover . Use Azure Traffic Manager to fail over the front end. If the front
end becomes unreachable in one region, Traffic Manager will route new requests to the secondary region.
Depending on your database solution, you may need to coordinate failing over the database.
Use automatic failover but manual failback . Use Traffic Manager for automatic failover, but not for
automatic failback. Automatic failback carries a risk that you might switch to the primary region before the
region is completely healthy. Instead, verify that all application subsystems are healthy before manually failing
back. Also, depending on the database, you might need to check data consistency before failing back.
Include redundancy for Traffic Manager . Traffic Manager is a possible failure point. Review the Traffic
Manager SLA, and determine whether using Traffic Manager alone meets your business requirements for high
availability. If not, consider adding another traffic management solution as a failback. If the Azure Traffic
Manager service fails, change your CNAME records in DNS to point to the other traffic management service.
Design to scale out
3/10/2022 • 2 minutes to read • Edit Online

Design your application so that it can scale horizontally


A primary advantage of the cloud is elastic scaling — the ability to use as much capacity as you need, scaling out
as load increases, and scaling in when the extra capacity is not needed. Design your application so that it can
scale horizontally, adding or removing new instances as demand requires.

Recommendations
Avoid instance stickiness . Stickiness, or session affinity, is when requests from the same client are always
routed to the same server. Stickiness limits the application's ability to scale out. For example, traffic from a high-
volume user will not be distributed across instances. Causes of stickiness include storing session state in
memory, and using machine-specific keys for encryption. Make sure that any instance can handle any request.
Identify bottlenecks . Scaling out isn't a magic fix for every performance issue. For example, if your backend
database is the bottleneck, it won't help to add more web servers. Identify and resolve the bottlenecks in the
system first, before throwing more instances at the problem. Stateful parts of the system are the most likely
cause of bottlenecks.
Decompose workloads by scalability requirements. Applications often consist of multiple workloads, with
different requirements for scaling. For example, an application might have a public-facing site and a separate
administration site. The public site may experience sudden surges in traffic, while the administration site has a
smaller, more predictable load.
Offload resource-intensive tasks. Tasks that require a lot of CPU or I/O resources should be moved to
background jobs when possible, to minimize the load on the front end that is handling user requests.
Use built-in autoscaling features . Many Azure compute services have built-in support for autoscaling. If the
application has a predictable, regular workload, scale out on a schedule. For example, scale out during business
hours. Otherwise, if the workload is not predictable, use performance metrics such as CPU or request queue
length to trigger autoscaling. For autoscaling best practices, see Autoscaling.
Consider aggressive autoscaling for critical workloads . For critical workloads, you want to keep ahead of
demand. It's better to add new instances quickly under heavy load to handle the additional traffic, and then
gradually scale back.
Design for scale in . Remember that with elastic scale, the application will have periods of scale in, when
instances get removed. The application must gracefully handle instances being removed. Here are some ways to
handle scalein:
Listen for shutdown events (when available) and shut down cleanly.
Clients/consumers of a service should support transient fault handling and retry.
For long-running tasks, consider breaking up the work, using checkpoints or the Pipes and Filters pattern.
Put work items on a queue so that another instance can pick up the work, if an instance is removed in the
middle of processing.
Partition around limits
3/10/2022 • 2 minutes to read • Edit Online

Use partitioning to work around database, network, and compute


limits
In the cloud, all services have limits in their ability to scale up. Azure service limits are documented in Azure
subscription and service limits, quotas, and constraints. Limits include number of cores, database size, query
throughput, and network throughput. If your system grows sufficiently large, you may hit one or more of these
limits. Use partitioning to work around these limits.
There are many ways to partition a system, such as:
Partition a database to avoid limits on database size, data I/O, or number of concurrent sessions.
Partition a queue or message bus to avoid limits on the number of requests or the number of concurrent
connections.
Partition an App Service web app to avoid limits on the number of instances per App Service plan.
A database can be partitioned horizontally, vertically, or functionally.
In horizontal partitioning, also called sharding, each partition holds data for a subset of the total data set.
The partitions share the same data schema. For example, customers whose names start with A–M go into
one partition, N–Z into another partition.
In vertical partitioning, each partition holds a subset of the fields for the items in the data store. For
example, put frequently accessed fields in one partition, and less frequently accessed fields in another.
In functional partitioning, data is partitioned according to how it is used by each bounded context in the
system. For example, store invoice data in one partition and product inventory data in another. The
schemas are independent.
For more detailed guidance, see Data partitioning.

Recommendations
Par tition different par ts of the application . Databases are one obvious candidate for partitioning, but also
consider storage, cache, queues, and compute instances.
Design the par tition key to avoid hotspots . If you partition a database, but one shard still gets the majority
of the requests, then you haven't solved your problem. Ideally, load gets distributed evenly across all the
partitions. For example, hash by customer ID and not the first letter of the customer name, because some letters
are more frequent. The same principle applies when partitioning a message queue. Pick a partition key that
leads to an even distribution of messages across the set of queues. For more information, see Sharding.
Par tition around Azure subscription and ser vice limits . Individual components and services have limits,
but there are also limits for subscriptions and resource groups. For very large applications, you might need to
partition around those limits.
Par tition at different levels . Consider a database server deployed on a VM. The VM has a VHD that is backed
by Azure Storage. The storage account belongs to an Azure subscription. Notice that each step in the hierarchy
has limits. The database server may have a connection pool limit. VMs have CPU and network limits. Storage has
IOPS limits. The subscription has limits on the number of VM cores. Generally, it's easier to partition lower in the
hierarchy. Only large applications should need to partition at the subscription level.
Design for operations
3/10/2022 • 2 minutes to read • Edit Online

Design an application so that the operations team has the tools they
need
The cloud has dramatically changed the role of the operations team. They are no longer responsible for
managing the hardware and infrastructure that hosts the application. That said, operations is still a critical part
of running a successful cloud application. Some of the important functions of the operations team include:
Deployment
Monitoring
Escalation
Incident response
Security auditing
Robust logging and tracing are particularly important in cloud applications. Involve the operations team in
design and planning, to ensure the application gives them the data and insight they need to be successful.

Recommendations
Make all things obser vable . Once a solution is deployed and running, logs and traces are your primary
insight into the system. Tracing records a path through the system, and is useful to pinpoint bottlenecks,
performance issues, and failure points. Logging captures individual events such as application state changes,
errors, and exceptions. Log in production, or else you lose insight at the very times when you need it the most.
Instrument for monitoring . Monitoring gives insight into how well (or poorly) an application is performing,
in terms of availability, performance, and system health. For example, monitoring tells you whether you are
meeting your SLA. Monitoring happens during the normal operation of the system. It should be as close to real-
time as possible, so that the operations staff can react to issues quickly. Ideally, monitoring can help avert
problems before they lead to a critical failure. For more information, see Monitoring and diagnostics.
Instrument for root cause analysis . Root cause analysis is the process of finding the underlying cause of
failures. It occurs after a failure has already happened.
Use distributed tracing . Use a distributed tracing system that is designed for concurrency, asynchrony, and
cloud scale. Traces should include a correlation ID that flows across service boundaries. A single operation may
involve calls to multiple application services. If an operation fails, the correlation ID helps to pinpoint the cause
of the failure.
Standardize logs and metrics . The operations team will need to aggregate logs from across the various
services in your solution. If every service uses its own logging format, it becomes difficult or impossible to get
useful information from them. Define a common schema that includes fields such as correlation ID, event name,
IP address of the sender, and so forth. Individual services can derive custom schemas that inherit the base
schema, and contain additional fields.
Automate management tasks , including provisioning, deployment, and monitoring. Automating a task
makes it repeatable and less prone to human errors.
Treat configuration as code . Check configuration files into a version control system, so that you can track and
version your changes, and roll back if needed.
Use platform as a service (PaaS) options
3/10/2022 • 2 minutes to read • Edit Online

When possible, use platform as a service (PaaS) rather than


infrastructure as a service (IaaS)
IaaS is like having a box of parts. You can build anything, but you have to assemble it yourself. PaaS options are
easier to configure and administer. You don't need to provision VMs, set up VNets, manage patches and updates,
and all of the other overhead associated with running software on a VM.
For example, suppose your application needs a message queue. You could set up your own messaging service
on a VM, using something like RabbitMQ. But Azure Service Bus already provides reliable messaging as service,
and it's simpler to set up. Just create a Service Bus namespace (which can be done as part of a deployment
script) and then call Service Bus using the client SDK.
Of course, your application may have specific requirements that make an IaaS approach more suitable. However,
even if your application is based on IaaS, look for places where it may be natural to incorporate PaaS options.
These include cache, queues, and data storage.

IN ST EA D O F RUN N IN G. . . C O N SIDER USIN G. . .

Active Directory Azure Active Directory

Elasticsearch Azure Search

Hadoop HDInsight

IIS App Service

MongoDB Cosmos DB

Redis Azure Cache for Redis

SQL Server Azure SQL Database

File share Azure NetApp Files

Please note that this is not meant to be an exhaustive list, but a subset of equivalent options.
Use the best data store for the job
3/10/2022 • 2 minutes to read • Edit Online

Pick the storage technology that is the best fit for your data and how
it will be used
Gone are the days when you would just stick all of your data into a big relational SQL database. Relational
databases are very good at what they do — providing ACID guarantees for transactions over relational data. But
they come with some costs:
Queries may require expensive joins.
Data must be normalized and conform to a predefined schema (schema on write).
Lock contention may impact performance.
In any large solution, it's likely that a single data store technology won't fill all your needs. Alternatives to
relational databases include key/value stores, document databases, search engine databases, time series
databases, column family databases, and graph databases. Each has pros and cons, and different types of data fit
more naturally into one or another.
For example, you might store a product catalog in a document database, such as Cosmos DB, which allows for a
flexible schema. In that case, each product description is a self-contained document. For queries over the entire
catalog, you might index the catalog and store the index in Azure Search. Product inventory might go into a SQL
database, because that data requires ACID guarantees.
Remember that data includes more than just the persisted application data. It also includes application logs,
events, messages, and caches.

Recommendations
Don't use a relational database for ever ything . Consider other data stores when appropriate. See Choose
the right data store.
Embrace polyglot persistence . In any large solution, it's likely that a single data store technology won't fill all
your needs.
Consider the type of data . For example, put transactional data into SQL, put JSON documents into a
document database, put telemetry data into a time series data base, put application logs in Elasticsearch, and put
blobs in Azure Blob Storage.
Prefer availability over (strong) consistency . The CAP theorem implies that a distributed system must
make trade-offs between availability and consistency. (Network partitions, the other leg of the CAP theorem, can
never be completely avoided.) Often, you can achieve higher availability by adopting an eventual consistency
model.
Consider the skillset of the development team . There are advantages to using polyglot persistence, but it's
possible to go overboard. Adopting a new data storage technology requires a new set of skills. The development
team must understand how to get the most out of the technology. They must understand appropriate usage
patterns, how to optimize queries, tune for performance, and so on. Factor this in when considering storage
technologies.
Use compensating transactions . A side effect of polyglot persistence is that single transaction might write
data to multiple stores. If something fails, use compensating transactions to undo any steps that already
completed.
Look at bounded contexts . Bounded context is a term from domain driven design. A bounded context is an
explicit boundary around a domain model, and defines which parts of the domain the model applies to. Ideally, a
bounded context maps to a subdomain of the business domain. The bounded contexts in your system are a
natural place to consider polyglot persistence. For example, "products" may appear in both the Product Catalog
subdomain and the Product Inventory subdomain, but it's very likely that these two subdomains have different
requirements for storing, updating, and querying products.
Design for evolution
3/10/2022 • 3 minutes to read • Edit Online

An evolutionary design is key for continuous innovation


All successful applications change over time, whether to fix bugs, add new features, bring in new technologies,
or make existing systems more scalable and resilient. If all the parts of an application are tightly coupled, it
becomes very hard to introduce changes into the system. A change in one part of the application may break
another part, or cause changes to ripple through the entire codebase.
This problem is not limited to monolithic applications. An application can be decomposed into services, but still
exhibit the sort of tight coupling that leaves the system rigid and brittle. But when services are designed to
evolve, teams can innovate and continuously deliver new features.
Microservices are becoming a popular way to achieve an evolutionary design, because they address many of
the considerations listed here.

Recommendations
Enforce high cohesion and loose coupling . A service is cohesive if it provides functionality that logically
belongs together. Services are loosely coupled if you can change one service without changing the other. High
cohesion generally means that changes in one function will require changes in other related functions. If you
find that updating a service requires coordinated updates to other services, it may be a sign that your services
are not cohesive. One of the goals of domain-driven design (DDD) is to identify those boundaries.
Encapsulate domain knowledge . When a client consumes a service, the responsibility for enforcing the
business rules of the domain should not fall on the client. Instead, the service should encapsulate all of the
domain knowledge that falls under its responsibility. Otherwise, every client has to enforce the business rules,
and you end up with domain knowledge spread across different parts of the application.
Use asynchronous messaging . Asynchronous messaging is a way to decouple the message producer from
the consumer. The producer does not depend on the consumer responding to the message or taking any
particular action. With a pub/sub architecture, the producer may not even know who is consuming the message.
New services can easily consume the messages without any modifications to the producer.
Don't build domain knowledge into a gateway . Gateways can be useful in a microservices architecture, for
things like request routing, protocol translation, load balancing, or authentication. However, the gateway should
be restricted to this sort of infrastructure functionality. It should not implement any domain knowledge, to avoid
becoming a heavy dependency.
Expose open interfaces . Avoid creating custom translation layers that sit between services. Instead, a service
should expose an API with a well-defined API contract. The API should be versioned, so that you can evolve the
API while maintaining backward compatibility. That way, you can update a service without coordinating updates
to all of the upstream services that depend on it. Public facing services should expose a RESTful API over HTTP.
Backend services might use an RPC-style messaging protocol for performance reasons.
Design and test against ser vice contracts . When services expose well-defined APIs, you can develop and
test against those APIs. That way, you can develop and test an individual service without spinning up all of its
dependent services. (Of course, you would still perform integration and load testing against the real services.)
Abstract infrastructure away from domain logic . Don't let domain logic get mixed up with infrastructure-
related functionality, such as messaging or persistence. Otherwise, changes in the domain logic will require
updates to the infrastructure layers and vice versa.
Offload cross-cutting concerns to a separate ser vice . For example, if several services need to
authenticate requests, you could move this functionality into its own service. Then you could evolve the
authentication service — for example, by adding a new authentication flow — without touching any of the
services that use it.
Deploy ser vices independently . When the DevOps team can deploy a single service independently of other
services in the application, updates can happen more quickly and safely. Bug fixes and new features can be
rolled out at a more regular cadence. Design both the application and the release process to support
independent updates.
Build for the needs of the business
3/10/2022 • 2 minutes to read • Edit Online

Every design decision must be justified by a business requirement


This design principle may seem obvious, but it's crucial to keep in mind when designing a solution. Do you
anticipate millions of users, or a few thousand? Is a one-hour application outage acceptable? Do you expect
large bursts in traffic or a predictable workload? Ultimately, every design decision must be justified by a
business requirement.

Recommendations
Define business objectives , including the recovery time objective (RTO), recovery point objective (RPO), and
maximum tolerable outage (MTO). These numbers should inform decisions about the architecture. For example,
to achieve a low RTO, you might implement automated failover to a secondary region. But if your solution can
tolerate a higher RTO, that degree of redundancy might be unnecessary.
Document ser vice level agreements (SL A) and ser vice level objectives (SLO) , including availability
and performance metrics. You might build a solution that delivers 99.95% availability. Is that enough? The
answer is a business decision.
Model the application around the business domain . Start by analyzing the business requirements. Use
these requirements to model the application. Consider using a domain-driven design (DDD) approach to create
domain models that reflect the business processes and use cases.
Capture both functional and nonfunctional requirements . Functional requirements let you judge
whether the application does the right thing. Nonfunctional requirements let you judge whether the application
does those things well. In particular, make sure that you understand your requirements for scalability,
availability, and latency. These requirements will influence design decisions and choice of technology.
Decompose by workload . The term "workload" in this context means a discrete capability or computing task,
which can be logically separated from other tasks. Different workloads may have different requirements for
availability, scalability, data consistency, and disaster recovery.
Plan for growth . A solution might meet your current needs, in terms of number of users, volume of
transactions, data storage, and so forth. However, a robust application can handle growth without major
architectural changes. See Design to scale out and Partition around limits. Also consider that your business
model and business requirements will likely change over time. If an application's service model and data models
are too rigid, it becomes hard to evolve the application for new use cases and scenarios. See Design for
evolution.
Manage costs . In a traditional on-premises application, you pay upfront for hardware as a capital expenditure.
In a cloud application, you pay for the resources that you consume. Make sure that you understand the pricing
model for the services that you consume. The total cost will include network bandwidth usage, storage, IP
addresses, service consumption, and other factors. For more information, see Azure pricing. Also consider your
operations costs. In the cloud, you don't have to manage the hardware or other infrastructure, but you still need
to manage your applications, including DevOps, incident response, disaster recovery, and so forth.
Choose a Kubernetes at the edge compute option
3/10/2022 • 6 minutes to read • Edit Online

This document discusses the trade-offs for various options available for extending compute on the edge. The
following considerations for each Kubernetes option are covered:
Operational cost. The expected labor required to maintain and operate the Kubernetes clusters.
Ease of configuration. The level of difficulty to configure and deploy a Kubernetes cluster.
Flexibility. A measure of how adaptable the Kubernetes option is to integrate a customized
configuration with existing infrastructure at the edge.
Mixed node. Ability to run a Kubernetes cluster with both Linux and Windows nodes.
Assumptions
You are a cluster operator looking to understand different options for running Kubernetes at the edge
and managing clusters in Azure.
You have a good understanding of existing infrastructure and any other infrastructure requirements,
including storage and networking requirements.
After reading this document, you'll be in a better position to identify which option best fits your scenario and the
environment required.

Kubernetes choices at a glance


EA SE O F
O P ERAT IO N A L C O N F IGURAT IO
C O ST N F L EXIB IL IT Y M IXED N O DE SUM M A RY

Bare-metal High** Difficult** High** Yes A ground-up


Kubernetes configuration on
any available
infrastructure at
location with the
option to use
Azure Arc for
added Azure
capabilities.

K8s on Azure Low Easy Low Linux only Kubernetes


Stack Edge Pro deployed on
Azure Stack Edge
appliance
deployed at
location.

AKS on HCI Low Easy Medium Yes AKS deployed on


Azure Stack HCI
or Windows
Server 2019.

*Other managed edge platforms (OpenShift, Tanzu, and so on) aren't in scope for this document.
**These values are based on using kubeadm, for the sake of simplicity. Different options for running bare-metal
Kubernetes at the edge would alter the rating in these categories.

Bare-metal Kubernetes
Ground-up configuration of Kubernetes using tools like kubeadm on any underlying infrastructure.
The biggest constraints for bare-metal Kubernetes are around the specific needs and requirements of the
organization. The opportunity to use any distribution, networking interface, and plugin means higher complexity
and operational cost. But this offers the most flexible option for customizing your cluster.
Scenario
Often, edge locations have specific requirements for running Kubernetes clusters that aren't met with the other
Azure solutions described in this document. Meaning this option is typically best for those unable to use
managed services due to unsupported existing infrastructure, or those who seek to have maximum control of
their clusters.
This option can be especially difficult for those who are new to Kubernetes. This isn't uncommon for
organizations looking to run edge clusters. Options like MicroK8s or k3s aim to flatten that learning
curve.
It's important to understand any underlying infrastructure and any integration that is expected to take
place up front. This will help to narrow down viable options and to identify any gaps with the open-
source tooling and/or plugins.
Enabling clusters with Azure Arc presents a simple way to manage your cluster from Azure alongside
other resources. This also brings other Azure capabilities to your cluster, including Azure Policy, Azure
Monitor, Microsoft Defender for Cloud, and other services.
Because cluster configuration isn't trivial, it's especially important to be mindful of CI/CD. Tracking and
acting on upstream changes of various plugins, and making sure those changes don't affect the health of
your cluster, becomes a direct responsibility. It's important for you to have a strong CI/CD solution, strong
testing, and monitoring in place.
Tooling options
Cluster bootstrap:
kubeadm: Kubernetes tool for creating ground-up Kubernetes clusters. Good for standard compute
resources (Linux/Windows).
MicroK8s: Simplified administration and configuration ("LowOps"), conformant Kubernetes by Canonical.
k3s: Certified Kubernetes distribution built for Internet of Things (IoT) and edge computing.
Storage:
Explore available CSI drivers: Many options are available to fit your requirements from cloud to local file
shares.
Networking:
A full list of available add-ons can be found here: Networking add-ons. Some popular options include
Flannel, a simple overlay network, and Calico, which provides a full networking stack.
Considerations
Operational cost:
Without the support that comes with managed services, it's up to the organization to maintain and operate
the cluster as a whole (storage, networking, upgrades, observability, application management). The
operational cost is considered high.
Ease of configuration:
Evaluating the many open-source options at every stage of configuration whether its networking, storage, or
monitoring options is inevitable and can become complex. Requires more consideration for configuring a
CI/CD for cluster configuration. Because of these concerns, the ease of configuration is considered difficult.
Flexibility:
With the ability to use any open-source tool or plugin without any provider restrictions, bare-metal
Kubernetes is highly flexible.

Kubernetes on Azure Stack Edge


Kubernetes cluster (a master VM and a worker VM) configured and deployed for you on your Azure Stack Edge
Pro device.
Azure Stack Edge Pro devices deliver Azure capabilities like compute, storage, networking, and hardware-
accelerated machine learning (ML) to any edge location. Kubernetes clusters can be created once the compute
role is enabled on any of the Pro-GPU, Pro-R, and Mini-R devices. Managing upgrades of the Kubernetes cluster
can be done using standard updates available for the device.
Scenario
Ideal for those with existing (Linux) IoT workloads or upgrading their compute for ML at the edge. This is a good
option when it isn't necessary to have more granular control over the clusters.
Admin permissions aren't granted by default. Although you can work with the product group to make
certain exceptions, this makes it difficult to have finer control of your cluster.
There is an extra cost if there isn't already an Azure Stack Edge device. Explore Azure Stack Edge devices
and see if any fit your compute requirements.
Calico, MetalLB, and CoreDNS are installed for Kubernetes networking on the device.
Only Linux workloads are supported at this time.
In addition to Kubernetes, Azure Stack Edge also comes with the IoT runtime, which means that
workloads may also be deployed to your Azure Stack Edge clusters via IoT Edge.
Support for two node clusters isn't currently available. This effectively means that this option is not a
highly available (HA) solution.
Considerations
Operational cost:
With the support that comes with the device, operational cost is minimal and is scoped to workload
management.
Ease of configuration:
Pre-configured and well-documented Kubernetes cluster deployment simplifies the configuration required
compared to bare-metal Kubernetes.
Flexibility:
Configuration is already set, and Admin permissions aren't granted by default. Product group involvement
may be required beyond basic configuration, and the underlying infrastructure must be an Azure Stack Edge
Pro device, making this a less flexible option.

AKS on HCI
Note: This option is currently in preview .
AKS-HCI is a set of predefined settings and configurations that is used to deploy one or more Kubernetes
clusters (with Windows Admin Center or PowerShell modules) on a multi-node cluster running either Windows
Server 2019 Datacenter or Azure Stack HCI 20H2.
Scenario
Ideal for those who want a simplified and streamlined way to get a Microsoft-supported cluster on compatible
devices (Azure Stack HCI or Windows Server 2019 Datacenter). Operations and configuration complexity are
reduced at the expense of the flexibility when compared to the bare-metal Kubernetes option.
Considerations
At the time of this writing, the preview comes with many limitations (permissions, networking limitations, large
compute requirements, and documentation gaps). Purposes other than evaluation and development are
discouraged that this time.
Operational cost:
Microsoft-supported cluster minimizes operational costs.
Ease of configuration:
Pre-configured and well-documented Kubernetes cluster deployment simplifies the configuration required
compared to bare-metal Kubernetes.
Flexibility:
Cluster configuration itself is set, but Admin permissions are granted. The underlying infrastructure must
either be Azure Stack HCI or Windows Server 2019. This option is more flexible than Kubernetes on Azure
Stack Edge and less flexible than bare-metal Kubernetes.

Next steps
For more information, see the following articles:
What is Azure IoT Edge
Kubernetes on your Azure Stack Edge Pro GPU device
Use IoT Edge module to run a Kubernetes stateless application on your Azure Stack Edge Pro GPU device
Deploy a Kubernetes stateless application via kubectl on your Azure Stack Edge Pro GPU device
AI at the edge with Azure Stack Hub
Building a CI/CD pipeline for microservices on Kubernetes
Use Kubernetes dashboard to monitor your Azure Stack Edge Pro GPU device
Understand data store models
3/10/2022 • 12 minutes to read • Edit Online

Modern business systems manage increasingly large volumes of heterogeneous data. This heterogeneity means
that a single data store is usually not the best approach. Instead, it's often better to store different types of data
in different data stores, each focused toward a specific workload or usage pattern. The term polyglot persistence
is used to describe solutions that use a mix of data store technologies. Therefore, it's important to understand
the main storage models and their tradeoffs.
Selecting the right data store for your requirements is a key design decision. There are literally hundreds of
implementations to choose from among SQL and NoSQL databases. Data stores are often categorized by how
they structure data and the types of operations they support. This article describes several of the most common
storage models. Note that a particular data store technology may support multiple storage models. For
example, a relational database management systems (RDBMS) may also support key/value or graph storage. In
fact, there is a general trend for so-called multi-model support, where a single database system supports
several models. But it's still useful to understand the different models at a high level.
Not all data stores in a given category provide the same feature-set. Most data stores provide server-side
functionality to query and process data. Sometimes this functionality is built into the data storage engine. In
other cases, the data storage and processing capabilities are separated, and there may be several options for
processing and analysis. Data stores also support different programmatic and management interfaces.
Generally, you should start by considering which storage model is best suited for your requirements. Then
consider a particular data store within that category, based on factors such as feature set, cost, and ease of
management.

NOTE
Learn more about identifying and reviewing your data service requirements for cloud adoption, in the Microsoft Cloud
Adoption Framework for Azure. Likewise, you can also learn about selecting storage tools and services.

Relational database management systems


Relational databases organize data as a series of two-dimensional tables with rows and columns. Most vendors
provide a dialect of the Structured Query Language (SQL) for retrieving and managing data. An RDBMS typically
implements a transactionally consistent mechanism that conforms to the ACID (Atomic, Consistent, Isolated,
Durable) model for updating information.
An RDBMS typically supports a schema-on-write model, where the data structure is defined ahead of time, and
all read or write operations must use the schema.
This model is very useful when strong consistency guarantees are important — where all changes are atomic,
and transactions always leave the data in a consistent state. However, an RDBMS generally can't scale out
horizontally without sharding the data in some way. Also, the data in an RDBMS must be normalized, which isn't
appropriate for every data set.
Azure services
Azure SQL Database | (Security Baseline)
Azure Database for MySQL | (Security Baseline)
Azure Database for PostgreSQL | (Security Baseline)
Azure Database for MariaDB | (Security Baseline)
Workload
Records are frequently created and updated.
Multiple operations have to be completed in a single transaction.
Relationships are enforced using database constraints.
Indexes are used to optimize query performance.
Data type
Data is highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for
all users and processes.
Size of individual data entries is small to medium-sized.
Examples
Inventory management
Order management
Reporting database
Accounting

Key/value stores
A key/value store associates each data value with a unique key. Most key/value stores only support simple
query, insert, and delete operations. To modify a value (either partially or completely), an application must
overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an
atomic operation.
An application can store arbitrary data as a set of values. Any schema information must be provided by the
application. The key/value store simply retrieves or stores the value by key.

Key/value stores are highly optimized for applications performing simple lookups, but are less suitable if you
need to query data across different key/value stores. Key/value stores are also not optimized for querying by
value.
A single key/value store can be extremely scalable, as the data store can easily distribute data across multiple
nodes on separate machines.
Azure services
Azure Cosmos DB Table API and SQL API | (Cosmos DB Security Baseline)
Azure Cache for Redis | (Security Baseline)
Azure Table Storage | (Security Baseline)
Workload
Data is accessed using a single key, like a dictionary.
No joins, lock, or unions are required.
No aggregation mechanisms are used.
Secondary indexes are generally not used.
Data type
Each key is associated with a single value.
There is no schema enforcement.
No relationships between entities.
Examples
Data caching
Session management
User preference and profile management
Product recommendation and ad serving

Document databases
A document database stores a collection of documents, where each document consists of named fields and data.
The data can be simple values or complex elements such as lists and child collections. Documents are retrieved
by unique keys.
Typically, a document contains the data for single entity, such as a customer or an order. A document may
contain information that would be spread across several relational tables in an RDBMS. Documents don't need
to have the same structure. Applications can store different data in documents as business requirements change.

Azure service
Azure Cosmos DB SQL API | (Cosmos DB Security Baseline)
Workload
Insert and update operations are common.
No object-relational impedance mismatch. Documents can better match the object structures used in
application code.
Individual documents are retrieved and written as a single block.
Data requires index on multiple fields.
Data type
Data can be managed in de-normalized way.
Size of individual document data is relatively small.
Each document type can use its own schema.
Documents can include optional fields.
Document data is semi-structured, meaning that data types of each field are not strictly defined.
Examples
Product catalog
Content management
Inventory management

Graph databases
A graph database stores two types of information, nodes and edges. Edges specify relationships between nodes.
Nodes and edges can have properties that provide information about that node or edge, similar to columns in a
table. Edges can also have a direction indicating the nature of the relationship.
Graph databases can efficiently perform queries across the network of nodes and edges and analyze the
relationships between entities. The following diagram shows an organization's personnel database structured as
a graph. The entities are employees and departments, and the edges indicate reporting relationships and the
departments in which employees work.

This structure makes it straightforward to perform queries such as "Find all employees who report directly or
indirectly to Sarah" or "Who works in the same department as John?" For large graphs with lots of entities and
relationships, you can perform very complex analyses very quickly. Many graph databases provide a query
language that you can use to traverse a network of relationships efficiently.
Azure services
Azure Cosmos DB Gremlin API | (Security Baseline)
SQL Server | (Security Baseline)
Workload
Complex relationships between data items involving many hops between related data items.
The relationship between data items are dynamic and change over time.
Relationships between objects are first-class citizens, without requiring foreign-keys and joins to traverse.
Data type
Nodes and relationships.
Nodes are similar to table rows or JSON documents.
Relationships are just as important as nodes, and are exposed directly in the query language.
Composite objects, such as a person with multiple phone numbers, tend to be broken into separate, smaller
nodes, combined with traversable relationships
Examples
Organization charts
Social graphs
Fraud detection
Recommendation engines

Data analytics
Data analytics stores provide massively parallel solutions for ingesting, storing, and analyzing data. The data is
distributed across multiple servers to maximize scalability. Large data file formats such as delimiter files (CSV),
parquet, and ORC are widely used in data analytics. Historical data is typically stored in data stores such as blob
storage or Azure Data Lake Storage Gen2, which are then accessed by Azure Synapse, Databricks, or HDInsight
as external tables. A typical scenario using data stored as parquet files for performance, is described in the
article Use external tables with Synapse SQL.
Azure services
Azure Synapse Analytics | (Security Baseline)
Azure Data Lake | (Security Baseline)
Azure Data Explorer | (Security Baseline)
Azure Analysis Services
HDInsight | (Security Baseline)
Azure Databricks | (Security Baseline)
Workload
Data analytics
Enterprise BI
Data type
Historical data from multiple sources.
Usually denormalized in a "star" or "snowflake" schema, consisting of fact and dimension tables.
Usually loaded with new data on a scheduled basis.
Dimension tables often include multiple historic versions of an entity, referred to as a slowly changing
dimension.
Examples
Enterprise data warehouse

Column-family databases
A column-family database organizes data into rows and columns. In its simplest form, a column-family database
can appear very similar to a relational database, at least conceptually. The real power of a column-family
database lies in its denormalized approach to structuring sparse data.
You can think of a column-family database as holding tabular data with rows and columns, but the columns are
divided into groups known as column families. Each column family holds a set of columns that are logically
related together and are typically retrieved or manipulated as a unit. Other data that is accessed separately can
be stored in separate column families. Within a column family, new columns can be added dynamically, and
rows can be sparse (that is, a row doesn't need to have a value for every column).
The following diagram shows an example with two column families, Identity and Contact Info . The data for a
single entity has the same row key in each column-family. This structure, where the rows for any given object in
a column family can vary dynamically, is an important benefit of the column-family approach, making this form
of data store highly suited for storing structured, volatile data.

Unlike a key/value store or a document database, most column-family databases store data in key order, rather
than by computing a hash. Many implementations allow you to create indexes over specific columns in a
column-family. Indexes let you retrieve data by columns value, rather than row key.
Read and write operations for a row are usually atomic with a single column-family, although some
implementations provide atomicity across the entire row, spanning multiple column-families.
Azure services
Azure Cosmos DB Cassandra API | (Security Baseline)
HBase in HDInsight | (Security Baseline)
Workload
Most column-family databases perform write operations extremely quickly.
Update and delete operations are rare.
Designed to provide high throughput and low-latency access.
Supports easy query access to a particular set of fields within a much larger record.
Massively scalable.
Data type
Data is stored in tables consisting of a key column and one or more column families.
Specific columns can vary by individual rows.
Individual cells are accessed via get and put commands
Multiple rows are returned using a scan command.
Examples
Recommendations
Personalization
Sensor data
Telemetry
Messaging
Social media analytics
Web analytics
Activity monitoring
Weather and other time-series data

Search Engine Databases


A search engine database allows applications to search for information held in external data stores. A search
engine database can index massive volumes of data and provide near real-time access to these indexes.
Indexes can be multi-dimensional and may support free-text searches across large volumes of text data.
Indexing can be performed using a pull model, triggered by the search engine database, or using a push model,
initiated by external application code.
Searching can be exact or fuzzy. A fuzzy search finds documents that match a set of terms and calculates how
closely they match. Some search engines also support linguistic analysis that can return matches based on
synonyms, genre expansions (for example, matching dogs to pets ), and stemming (matching words with the
same root).
Azure service
Azure Search | (Security Baseline)
Workload
Data indexes from multiple sources and services.
Queries are ad-hoc and can be complex.
Full text search is required.
Ad hoc self-service query is required.
Data type
Semi-structured or unstructured text
Text with reference to structured data
Examples
Product catalogs
Site search
Logging

Time series databases


Time series data is a set of values organized by time. Time series databases typically collect large amounts of
data in real time from a large number of sources. Updates are rare, and deletes are often done as bulk
operations. Although the records written to a time-series database are generally small, there are often a large
number of records, and total data size can grow rapidly.
Azure service
Azure Time Series Insights
Workload
Records are generally appended sequentially in time order.
An overwhelming proportion of operations (95-99%) are writes.
Updates are rare.
Deletes occur in bulk, and are made to contiguous blocks or records.
Data is read sequentially in either ascending or descending time order, often in parallel.
Data type
A timestamp is used as the primary key and sorting mechanism.
Tags may define additional information about the type, origin, and other information about the entry.
Examples
Monitoring and event telemetry.
Sensor or other IoT data.

Object storage
Object storage is optimized for storing and retrieving large binary objects (images, files, video and audio
streams, large application data objects and documents, virtual machine disk images). Large data files are also
popularly used in this model, for example, delimiter file (CSV), parquet, and ORC. Object stores can manage
extremely large amounts of unstructured data.
Azure service
Azure Blob Storage | (Security Baseline)
Azure Data Lake Storage Gen2 | (Security Baseline)
Workload
Identified by key.
Content is typically an asset such as a delimiter, image, or video file.
Content must be durable and external to any application tier.
Data type
Data size is large.
Value is opaque.
Examples
Images, videos, office documents, PDFs
Static HTML, JSON, CSS
Log and audit files
Database backups

Shared files
Sometimes, using simple flat files can be the most effective means of storing and retrieving information. Using
file shares enables files to be accessed across a network. Given appropriate security and concurrent access
control mechanisms, sharing data in this way can enable distributed services to provide highly scalable data
access for performing basic, low-level operations such as simple read and write requests.
Azure service
Azure Files | (Security Baseline)
Workload
Migration from existing apps that interact with the file system.
Requires SMB interface.
Data type
Files in a hierarchical set of folders.
Accessible with standard I/O libraries.
Examples
Legacy files
Shared content accessible among a number of VMs or app instances
Aided with this understanding of different data storage models, the next step is to evaluate your workload and
application, and decide which data store will meet your specific needs. Use the data storage decision tree to help
with this process.
Select an Azure data store for your application
3/10/2022 • 2 minutes to read • Edit Online

Azure offers a number of managed data storage solutions, each providing different features and capabilities.
This article will help you to choose a managed data store for your application.
If your application consists of multiple workloads, evaluate each workload separately. A complete solution may
incorporate multiple data stores.

Select a candidate
Use the following flowchart to select a candidate Azure managed data store.
The output from this flowchart is a star ting point for consideration. Next, perform a more detailed evaluation
of the data store to see if it meets your needs. Refer to Criteria for choosing a data store to aid in this evaluation.

Choose specialized storage


Alternative database solutions often require specific storage solutions. For example, SAP HANA on VMs often
employs Azure NetApp Files as its underlying storage solution. Evaluate your vendor's requirements to find an
appropriate storage solution to meet your database's requirements. For more information about selecting a
storage solution, see Review your storage options.
Criteria for choosing a data store
3/10/2022 • 3 minutes to read • Edit Online

This article describes the comparison criteria you should use when evaluating a data store. The goal is to help
you determine which data storage types can meet your solution's requirements.

General considerations
Keep the following considerations in mind when making your selection.
Functional requirements
Data format . What type of data are you intending to store? Common types include transactional data,
JSON objects, telemetry, search indexes, or flat files.
Data size . How large are the entities you need to store? Will these entities need to be maintained as a
single document, or can they be split across multiple documents, tables, collections, and so forth?
Scale and structure . What is the overall amount of storage capacity you need? Do you anticipate
partitioning your data?
Data relationships . Will your data need to support one-to-many or many-to-many relationships? Are
relationships themselves an important part of the data? Will you need to join or otherwise combine data
from within the same dataset, or from external datasets?
Consistency model . How important is it for updates made in one node to appear in other nodes, before
further changes can be made? Can you accept eventual consistency? Do you need ACID guarantees for
transactions?
Schema flexibility . What kind of schemas will you apply to your data? Will you use a fixed schema, a
schema-on-write approach, or a schema-on-read approach?
Concurrency . What kind of concurrency mechanism do you want to use when updating and
synchronizing data? Will the application perform many updates that could potentially conflict. If so, you
may require record locking and pessimistic concurrency control. Alternatively, can you support optimistic
concurrency controls? If so, is simple timestamp-based concurrency control enough, or do you need the
added functionality of multi-version concurrency control?
Data movement . Will your solution need to perform ETL tasks to move data to other stores or data
warehouses?
Data lifecycle . Is the data write-once, read-many? Can it be moved into cool or cold storage?
Other suppor ted features . Do you need any other specific features, such as schema validation,
aggregation, indexing, full-text search, MapReduce, or other query capabilities?
Non-functional requirements
Performance and scalability . What are your data performance requirements? Do you have specific
requirements for data ingestion rates and data processing rates? What are the acceptable response times
for querying and aggregation of data once ingested? How large will you need the data store to scale up?
Is your workload more read-heavy or write-heavy?
Reliability . What overall SLA do you need to support? What level of fault-tolerance do you need to
provide for data consumers? What kind of backup and restore capabilities do you need?
Replication . Will your data need to be distributed among multiple replicas or regions? What kind of data
replication capabilities do you require?
Limits . Will the limits of a particular data store support your requirements for scale, number of
connections, and throughput?
Management and cost
Managed ser vice . When possible, use a managed data service, unless you require specific capabilities
that can only be found in an IaaS-hosted data store.
Region availability . For managed services, is the service available in all Azure regions? Does your
solution need to be hosted in certain Azure regions?
Por tability . Will your data need to be migrated to on-premises, external datacenters, or other cloud
hosting environments?
Licensing . Do you have a preference of a proprietary versus OSS license type? Are there any other
external restrictions on what type of license you can use?
Overall cost . What is the overall cost of using the service within your solution? How many instances will
need to run, to support your uptime and throughput requirements? Consider operations costs in this
calculation. One reason to prefer managed services is the reduced operational cost.
Cost effectiveness . Can you partition your data, to store it more cost effectively? For example, can you
move large objects out of an expensive relational database into an object store?
Security
Security . What type of encryption do you require? Do you need encryption at rest? What authentication
mechanism do you want to use to connect to your data?
Auditing . What kind of audit log do you need to generate?
Networking requirements . Do you need to restrict or otherwise manage access to your data from
other network resources? Does data need to be accessible only from inside the Azure environment? Does
the data need to be accessible from specific IP addresses or subnets? Does it need to be accessible from
applications or services hosted on-premises or in other external datacenters?
DevOps
Skill set . Are there particular programming languages, operating systems, or other technology that your
team is particularly adept at using? Are there others that would be difficult for your team to work with?
Clients Is there good client support for your development languages?
Choose a big data storage technology in Azure
3/10/2022 • 8 minutes to read • Edit Online

This topic compares options for data storage for big data solutions — specifically, data storage for bulk data
ingestion and batch processing, as opposed to analytical data stores or real-time streaming ingestion.

What are your options when choosing data storage in Azure?


There are several options for ingesting data into Azure, depending on your needs.
File storage:
Azure Storage blobs
Azure Data Lake Store
NoSQL databases:
Azure Cosmos DB
HBase on HDInsight
Analytical databases:
Azure Data Explorer

Azure Storage blobs


Azure Storage is a managed storage service that is highly available, secure, durable, scalable, and redundant.
Microsoft takes care of maintenance and handles critical problems for you. Azure Storage is the most ubiquitous
storage solution Azure provides, due to the number of services and tools that can be used with it.
There are various Azure Storage services you can use to store data. The most flexible option for storing blobs
from a number of data sources is Blob storage. Blobs are basically files. They store pictures, documents, HTML
files, virtual hard disks (VHDs), big data such as logs, database backups — pretty much anything. Blobs are
stored in containers, which are similar to folders. A container provides a grouping of a set of blobs. A storage
account can contain an unlimited number of containers, and a container can store an unlimited number of blobs.
Azure Storage is a good choice for big data and analytics solutions, because of its flexibility, high availability, and
low cost. It provides hot, cool, and archive storage tiers for different use cases. For more information, see Azure
Blob Storage: Hot, cool, and archive storage tiers.
Azure Blob storage can be accessed from Hadoop (available through HDInsight). HDInsight can use a blob
container in Azure Storage as the default file system for the cluster. Through a Hadoop distributed file system
(HDFS) interface provided by a WASB driver, the full set of components in HDInsight can operate directly on
structured or unstructured data stored as blobs. Azure Blob storage can also be accessed via Azure Synapse
Analytics using its PolyBase feature.
Other features that make Azure Storage a good choice are:
Multiple concurrency strategies.
Disaster recovery and high availability options.
Encryption at rest.
Azure role-based access control (Azure RBAC) to control access using Azure Active Directory users and
groups.
Azure Data Lake Store
Azure Data Lake Store is an enterprise-wide hyperscale repository for big data analytic workloads. Data Lake
enables you to capture data of any size, type, and ingestion speed in one single secure location for operational
and exploratory analytics.
Data Lake Store does not impose any limits on account sizes, file sizes, or the amount of data that can be stored
in a data lake. Data is stored durably by making multiple copies and there is no limit on the duration of time that
the data can be stored in the Data Lake. In addition to making multiple copies of files to guard against any
unexpected failures, Data lake spreads parts of a file over a number of individual storage servers. This improves
the read throughput when reading the file in parallel for performing data analytics.
Data Lake Store can be accessed from Hadoop (available through HDInsight) using the WebHDFS-compatible
REST APIs. You may consider using this as an alternative to Azure Storage when your individual or combined file
sizes exceed that which is supported by Azure Storage. However, there are performance tuning guidelines you
should follow when using Data Lake Store as your primary storage for an HDInsight cluster, with specific
guidelines for Spark, Hive, MapReduce, and Storm. Also, be sure to check Data Lake Store's regional availability,
because it is not available in as many regions as Azure Storage, and it needs to be located in the same region as
your HDInsight cluster.
Coupled with Azure Data Lake Analytics, Data Lake Store is specifically designed to enable analytics on the
stored data and is tuned for performance for data analytics scenarios. Data Lake Store can also be accessed via
Azure Synapse using its PolyBase feature.

Azure Cosmos DB
Azure Cosmos DB is Microsoft's globally distributed multi-model database. Cosmos DB guarantees single-digit-
millisecond latencies at the 99th percentile anywhere in the world, offers multiple well-defined consistency
models to fine-tune performance, and guarantees high availability with multi-homing capabilities.
Azure Cosmos DB is schema-agnostic. It automatically indexes all the data without requiring you to deal with
schema and index management. It's also multi-model, natively supporting document, key-value, graph, and
column-family data models.
Azure Cosmos DB features:
Geo-replication
Elastic scaling of throughput and storage worldwide
Five well-defined consistency levels

HBase on HDInsight
Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable.
HBase provides random access and strong consistency for large amounts of unstructured and semi-structured
data in a schemaless database organized by column families.
Data is stored in the rows of a table, and data within a row is grouped by column family. HBase is schemaless in
the sense that neither the columns nor the type of data stored in them need to be defined before using them.
The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data
redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop
ecosystem.
The HDInsight implementation leverages the scale-out architecture of HBase to provide automatic sharding of
tables, strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory
caching for reads and high-throughput streaming for writes. In most cases, you'll want to create the HBase
cluster inside a virtual network so other HDInsight clusters and applications can directly access the tables.
Azure Data Explorer
Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. It helps you
handle the many data streams emitted by modern software so you can collect, store, and analyze data. Azure
Data Explorer is ideal for analyzing large volumes of diverse data from any data source, such as websites,
applications, IoT devices, and more. This data is used for diagnostics, monitoring, reporting, machine learning,
and additional analytics capabilities. Azure Data Explorer makes it simple to ingest this data and enables you to
do complex ad hoc queries on the data in seconds.
Azure Data Explorer can be linearly scaled out for increasing ingestion and query processing throughput. An
Azure Data Explorer cluster can be deployed to a Virtual Network for enabling private networks.

Key selection criteria


To narrow the choices, start by answering these questions:
Do you need managed, high-speed, cloud-based storage for any type of text or binary data? If yes, then
select one of the file storage or analytics options.
Do you need file storage that is optimized for parallel analytics workloads and high throughput/IOPS? If
yes, then choose an option that is tuned to analytics workload performance.
Do you need to store unstructured or semi-structured data in a schemaless database? If so, select one of
the non-relational or analytics options. Compare options for indexing and database models. Depending
on the type of data you need to store, the primary database models may be the largest factor.
Can you use the service in your region? Check the regional availability for each Azure service. See
Products available by region.

Capability matrix
The following tables summarize the key differences in capabilities.
File storage capabilities
C A PA B IL IT Y A Z URE DATA L A K E STO RE A Z URE B LO B STO RA GE C O N TA IN ERS

Purpose Optimized storage for big data General purpose object store for a
analytics workloads wide variety of storage scenarios

Use cases Batch, streaming analytics, and Any type of text or binary data, such
machine learning data such as log files, as application back end, backup data,
IoT data, click streams, large datasets media storage for streaming, and
general purpose data

Structure Hierarchical file system Object store with flat namespace

Authentication Based on Azure Active Directory Based on shared secrets Account


Identities Access Keys and Shared Access
Signature Keys, and Azure role-based
access control (Azure RBAC)

Authentication protocol OAuth 2.0. Calls must contain a valid Hash-based message authentication
JWT (JSON web token) issued by Azure code (HMAC). Calls must contain a
Active Directory Base64-encoded SHA-256 hash over a
part of the HTTP request.
C A PA B IL IT Y A Z URE DATA L A K E STO RE A Z URE B LO B STO RA GE C O N TA IN ERS

Authorization POSIX access control lists (ACLs). ACLs For account-level authorization use
based on Azure Active Directory Account Access Keys. For account,
identities can be set file and folder container, or blob authorization use
level. Shared Access Signature Keys.

Auditing Available. Available

Encryption at rest Transparent, server side Transparent, server side; Client-side


encryption

Developer SDKs .NET, Java, Python, Node.js .NET, Java, Python, Node.js, C++, Ruby

Analytics workload performance Optimized performance for parallel Not optimized for analytics workloads
analytics workloads, High Throughput
and IOPS

Size limits No limits on account sizes, file sizes or Specific limits documented here
number of files

Geo-redundancy Locally-redundant (LRS), globally Locally redundant (LRS), globally


redundant (GRS), read-access globally redundant (GRS), read-access globally
redundant (RA-GRS), zone-redundant redundant (RA-GRS), zone-redundant
(ZRS). (ZRS). See here for more information

NoSQL database capabilities


C A PA B IL IT Y A Z URE C O SM O S DB H B A SE O N H DIN SIGH T

Primary database model Document store, graph, key-value Wide column store
store, wide column store

Secondary indexes Yes No

SQL language support Yes Yes (using the Phoenix JDBC driver)

Consistency Strong, bounded-staleness, session, Strong


consistent prefix, eventual

Native Azure Functions integration Yes No

Automatic global distribution Yes No HBase cluster replication can be


configured across regions with
eventual consistency

Pricing model Elastically scalable request units (RUs) Per-minute pricing for HDInsight
charged per-second as needed, cluster (horizontal scaling of nodes),
elastically scalable storage storage

Analytical database capabilities


C A PA B IL IT Y A Z URE DATA EXP LO RER
C A PA B IL IT Y A Z URE DATA EXP LO RER

Primary database model Relational (column store), telemetry,


and time series store

SQL language support Yes

Pricing model Elastically scalable cluster instances

Authentication Based on Azure Active Directory


identities

Encryption at rest Supported, customer managed keys

Analytics workload performance Optimized performance for parallel


analytics workloads

Size limits Linearly scalable


Online transaction processing (OLTP)
3/10/2022 • 6 minutes to read • Edit Online

The management of transactional data using computer systems is referred to as online transaction processing
(OLTP). OLTP systems record business interactions as they occur in the day-to-day operation of the organization,
and support querying of this data to make inferences.

Transactional data
Transactional data is information that tracks the interactions related to an organization's activities. These
interactions are typically business transactions, such as payments received from customers, payments made to
suppliers, products moving through inventory, orders taken, or services delivered. Transactional events, which
represent the transactions themselves, typically contain a time dimension, some numerical values, and
references to other data.
Transactions typically need to be atomic and consistent. Atomicity means that an entire transaction always
succeeds or fails as one unit of work, and is never left in a half-completed state. If a transaction cannot be
completed, the database system must roll back any steps that were already done as part of that transaction. In a
traditional RDBMS, this rollback happens automatically if a transaction cannot be completed. Consistency means
that transactions always leave the data in a valid state. (These are very informal descriptions of atomicity and
consistency. There are more formal definitions of these properties, such as ACID.)
Transactional databases can support strong consistency for transactions using various locking strategies, such as
pessimistic locking, to ensure that all data is strongly consistent within the context of the enterprise, for all users
and processes.
The most common deployment architecture that uses transactional data is the data store tier in a 3-tier
architecture. A 3-tier architecture typically consists of a presentation tier, business logic tier, and data store tier. A
related deployment architecture is the N-tier architecture, which may have multiple middle-tiers handling
business logic.

Typical traits of transactional data


Transactional data tends to have the following traits:

REQ UIREM EN T DESC RIP T IO N

Normalization Highly normalized

Schema Schema on write, strongly enforced

Consistency Strong consistency, ACID guarantees

Integrity High integrity

Uses transactions Yes

Locking strategy Pessimistic or optimistic

Updateable Yes
REQ UIREM EN T DESC RIP T IO N

Appendable Yes

Workload Heavy writes, moderate reads

Indexing Primary and secondary indexes

Datum size Small to medium sized

Model Relational

Data shape Tabular

Query flexibility Highly flexible

Scale Small (MBs) to Large (a few TBs)

When to use this solution


Choose OLTP when you need to efficiently process and store business transactions and immediately make them
available to client applications in a consistent way. Use this architecture when any tangible delay in processing
would have a negative impact on the day-to-day operations of the business.
OLTP systems are designed to efficiently process and store transactions, as well as query transactional data. The
goal of efficiently processing and storing individual transactions by an OLTP system is partly accomplished by
data normalization — that is, breaking the data up into smaller chunks that are less redundant. This supports
efficiency because it enables the OLTP system to process large numbers of transactions independently, and
avoids extra processing needed to maintain data integrity in the presence of redundant data.

Challenges
Implementing and using an OLTP system can create a few challenges:
OLTP systems are not always good for handling aggregates over large amounts of data, although there are
exceptions, such as a well-planned SQL Server-based solution. Analytics against the data, that rely on
aggregate calculations over millions of individual transactions, are very resource intensive for an OLTP
system. They can be slow to execute and can cause a slow-down by blocking other transactions in the
database.
When conducting analytics and reporting on data that is highly normalized, the queries tend to be complex,
because most queries need to de-normalize the data by using joins. Also, naming conventions for database
objects in OLTP systems tend to be terse and succinct. The increased normalization coupled with terse
naming conventions makes OLTP systems difficult for business users to query, without the help of a DBA or
data developer.
Storing the history of transactions indefinitely and storing too much data in any one table can lead to slow
query performance, depending on the number of transactions stored. The common solution is to maintain a
relevant window of time (such as the current fiscal year) in the OLTP system and offload historical data to
other systems, such as a data mart or data warehouse.

OLTP in Azure
Applications such as websites hosted in App Service Web Apps, REST APIs running in App Service, or mobile or
desktop applications communicate with the OLTP system, typically via a REST API intermediary.
In practice, most workloads are not purely OLTP. There tends to be an analytical component as well. In addition,
there is an increasing demand for real-time reporting, such as running reports against the operational system.
This is also referred to as HTAP (Hybrid Transactional and Analytical Processing). For more information, see
Online Analytical Processing (OLAP).
In Azure, all of the following data stores will meet the core requirements for OLTP and the management of
transaction data:
Azure SQL Database
SQL Server in an Azure virtual machine
Azure Database for MySQL
Azure Database for PostgreSQL

Key selection criteria


To narrow the choices, start by answering these questions:
Do you want a managed service rather than managing your own servers?
Does your solution have specific dependencies for Microsoft SQL Server, MySQL or PostgreSQL
compatibility? Your application may limit the data stores you can choose based on the drivers it supports
for communicating with the data store, or the assumptions it makes about which database is used.
Are your write throughput requirements particularly high? If yes, choose an option that provides in-
memory tables.
Is your solution multitenant? If so, consider options that support capacity pools, where multiple database
instances draw from an elastic pool of resources, instead of fixed resources per database. This can help
you better distribute capacity across all database instances, and can make your solution more cost
effective.
Does your data need to be readable with low latency in multiple regions? If yes, choose an option that
supports readable secondary replicas.
Does your database need to be highly available across geo-graphic regions? If yes, choose an option that
supports geographic replication. Also consider the options that support automatic failover from the
primary replica to a secondary replica.
Does your database have specific security needs? If yes, examine the options that provide capabilities like
row level security, data masking, and transparent data encryption.

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Is Managed Service Yes No Yes Yes

Runs on Platform N/A Windows, Linux, N/A N/A


Docker
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Programmability 1 T-SQL, .NET, R T-SQL, .NET, R, SQL SQL, PL/pgSQL


Python

[1] Not including client driver support, which allows many programming languages to connect to and use the
OLTP data store.
Scalability capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Maximum database 4 TB 256 TB 16 TB 16 TB


instance size

Supports capacity Yes Yes No No


pools

Supports clusters No Yes No No


scale out

Dynamic scalability Yes No Yes Yes


(scale up)

Analytic workload capabilities


SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Temporal tables Yes Yes No No

In-memory Yes Yes No No


(memory-optimized)
tables

Columnstore support Yes Yes No No

Adaptive query Yes Yes No No


processing

Availability capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Readable secondaries Yes Yes Yes Yes

Geographic Yes Yes Yes Yes


replication

Automatic failover to Yes No No No


secondary
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Point-in-time restore Yes Yes Yes Yes

Security capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L

Row level security Yes Yes Yes Yes

Data masking Yes Yes No No

Transparent data Yes Yes Yes Yes


encryption

Restrict access to Yes Yes Yes Yes


specific IP addresses

Restrict access to Yes Yes Yes Yes


allow VNet access
only

Azure Active Yes No Yes Yes


Directory
authentication

Active Directory No Yes No No


authentication

Multi-factor Yes No Yes Yes


authentication

Supports Always Yes Yes No No


Encrypted

Private IP No Yes No No
Choose a data pipeline orchestration technology in
Azure
3/10/2022 • 2 minutes to read • Edit Online

Most big data solutions consist of repeated data processing operations, encapsulated in workflows. A pipeline
orchestrator is a tool that helps to automate these workflows. An orchestrator can schedule jobs, execute
workflows, and coordinate dependencies among tasks.

What are your options for data pipeline orchestration?


In Azure, the following services and tools will meet the core requirements for pipeline orchestration, control
flow, and data movement:
Azure Data Factory
Oozie on HDInsight
SQL Server Integration Services (SSIS)
These services and tools can be used independently from one another, or used together to create a hybrid
solution. For example, the Integration Runtime (IR) in Azure Data Factory V2 can natively execute SSIS packages
in a managed Azure compute environment. While there is some overlap in functionality between these services,
there are a few key differences.

Key Selection Criteria


To narrow the choices, start by answering these questions:
Do you need big data capabilities for moving and transforming your data? Usually this means multi-
gigabytes to terabytes of data. If yes, then narrow your options to those that best suited for big data.
Do you require a managed service that can operate at scale? If yes, select one of the cloud-based services
that aren't limited by your local processing power.
Are some of your data sources located on-premises? If yes, look for options that can work with both
cloud and on-premises data sources or destinations.
Is your source data stored in Blob storage on an HDFS filesystem? If so, choose an option that supports
Hive queries.

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T

Managed Yes No Yes

Cloud-based Yes No (local) Yes


SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T

Prerequisite Azure Subscription SQL Server Azure Subscription,


HDInsight cluster

Management tools Azure Portal, PowerShell, SSMS, PowerShell Bash shell, Oozie REST API,
CLI, .NET SDK Oozie web UI

Pricing Pay per usage Licensing / pay for features No additional charge on top
of running the HDInsight
cluster

Pipeline capabilities
SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T

Copy data Yes Yes Yes

Custom transformations Yes Yes Yes (MapReduce, Pig, and


Hive jobs)

Azure Machine Learning Yes Yes (with scripting) No


scoring

HDInsight On-Demand Yes No No

Azure Batch Yes No No

Pig, Hive, MapReduce Yes No Yes

Spark Yes No No

Execute SSIS Package Yes Yes No

Control flow Yes Yes Yes

Access on-premises data Yes Yes No

Scalability capabilities
SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T

Scale up Yes No No

Scale out Yes No Yes (by adding worker


nodes to cluster)

Optimized for big data Yes No Yes


Choose a search data store in Azure
3/10/2022 • 2 minutes to read • Edit Online

This article compares technology choices for search data stores in Azure. A search data store is used to create
and store specialized indexes for performing searches on free-form text. The text that is indexed may reside in a
separate data store, such as blob storage. An application submits a query to the search data store, and the result
is a list of matching documents. For more information about this scenario, see Processing free-form text for
search.

What are your options when choosing a search data store?


In Azure, all of the following data stores will meet the core requirements for search against free-form text data
by providing a search index:
Azure Cognitive Search
Elasticsearch
HDInsight with Solr
Azure SQL Database with full text search

Key selection criteria


For search scenarios, begin choosing the appropriate search data store for your needs by answering these
questions:
Do you want a managed service rather than managing your own servers?
Can you specify your index schema at design time? If not, choose an option that supports updateable
schemas.
Do you need an index only for full-text search, or do you also need rapid aggregation of numeric data
and other analytics? If you need functionality beyond full-text search, consider options that support
additional analytics.
Do you need a search index for log analytics, with support for log collection, aggregation, and
visualizations on indexed data? If so, consider Elasticsearch, which is part of a log analytics stack.
Do you need to index data in common document formats such as PDF, Word, PowerPoint, and Excel? If
yes, choose an option that provides document indexers.
Does your database have specific security needs? If yes, consider the security features listed below.

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE

Is managed service Yes No Yes Yes


H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE

REST API Yes Yes Yes No

Programmability .NET, Java, Python, Java Java T-SQL


JavaScript

Document indexers Yes No Yes No


for common file
types (PDF, DOCX,
TXT, and so on)

Manageability capabilities
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE

Updateable schema Yes Yes Yes Yes

Supports scale out Yes Yes Yes No

Analytic workload capabilities


H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE

Supports analytics No Yes Yes Yes


beyond full text
search

Part of a log analytics No Yes (ELK) No No


stack

Supports semantic Yes (find similar Yes Yes Yes


search documents only)

Security capabilities
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE

Row-level security Partial (requires Partial (requires Yes Yes


application query to application query to
filter by group id) filter by group id)

Transparent data No No No Yes


encryption

Restrict access to Yes Yes Yes Yes


specific IP addresses

Restrict access to Yes Yes Yes Yes


allow virtual network
access only
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE

Active Directory No No No Yes


authentication
(integrated
authentication)

See also
Processing free-form text for search
Transfer data to and from Azure
3/10/2022 • 7 minutes to read • Edit Online

There are several options for transferring data to and from Azure, depending on your needs.

Physical transfer
Using physical hardware to transfer data to Azure is a good option when:
Your network is slow or unreliable.
Getting additional network bandwidth is cost-prohibitive.
Security or organizational policies do not allow outbound connections when dealing with sensitive data.
If your primary concern is how long it will take to transfer your data, you may want to run a test to verify
whether network transfer is actually slower than physical transport.
There are two main options for physically transporting data to Azure:
Azure Impor t/Expor t . The Azure Import/Export service lets you securely transfer large amounts of data
to Azure Blob Storage or Azure Files by shipping internal SATA HDDs or SDDs to an Azure datacenter. You
can also use this service to transfer data from Azure Storage to hard disk drives and have these shipped
to you for loading on-premises.
Azure Data Box . Azure Data Box is a Microsoft-provided appliance that works much like the Azure
Import/Export service. Microsoft ships you a proprietary, secure, and tamper-resistant transfer appliance
and handles the end-to-end logistics, which you can track through the portal. One benefit of the Azure
Data Box service is ease of use. You don't need to purchase several hard drives, prepare them, and
transfer files to each one. Azure Data Box is supported by a number of industry-leading Azure partners to
make it easier to seamlessly use offline transport to the cloud from their products.

Command line tools and APIs


Consider these options when you want scripted and programmatic data transfer.
Azure CLI . The Azure CLI is a cross-platform tool that allows you to manage Azure services and upload
data to Azure Storage.
AzCopy . Use AzCopy from a Windows or Linux command-line to easily copy data to and from Azure
Blob, File, and Table storage with optimal performance. AzCopy supports concurrency and parallelism,
and the ability to resume copy operations when interrupted. You can also use AzCopy to copy data from
AWS to Azure. For programmatic access, the Microsoft Azure Storage Data Movement Library is the core
framework that powers AzCopy. It is provided as a .NET Core library.
PowerShell . The Start-AzureStorageBlobCopy PowerShell cmdlet is an option for Windows
administrators who are used to PowerShell.
AdlCopy . AdlCopy enables you to copy data from Azure Storage Blobs into Data Lake Store. It can also
be used to copy data between two Azure Data Lake Store accounts. However, it cannot be used to copy
data from Data Lake Store to Storage Blobs.
Distcp . If you have an HDInsight cluster with access to Data Lake Store, you can use Hadoop ecosystem
tools like Distcp to copy data to and from an HDInsight cluster storage (WASB) into a Data Lake Store
account.
Sqoop . Sqoop is an Apache project and part of the Hadoop ecosystem. It comes preinstalled on all
HDInsight clusters. It allows data transfer between an HDInsight cluster and relational databases such as
SQL, Oracle, MySQL, and so on. Sqoop is a collection of related tools, including import and export. Sqoop
works with HDInsight clusters using either Azure Storage blobs or Data Lake Store attached storage.
PolyBase . PolyBase is a technology that accesses data outside of the database through the T-SQL
language. In SQL Server 2016, it allows you to run queries on external data in Hadoop or to
import/export data from Azure Blob Storage. In Azure Synapse Analytics, you can import/export data
from Azure Blob Storage and Azure Data Lake Store. Currently, PolyBase is the fastest method of
importing data into Azure Synapse.
Hadoop command line . When you have data that resides on an HDInsight cluster head node, you can
use the hadoop -copyFromLocal command to copy that data to your cluster's attached storage, such as
Azure Storage blob or Azure Data Lake Store. In order to use the Hadoop command, you must first
connect to the head node. Once connected, you can upload a file to storage.

Graphical interface
Consider the following options if you are only transferring a few files or data objects and don't need to
automate the process.
Azure Storage Explorer . Azure Storage Explorer is a cross-platform tool that lets you manage the
contents of your Azure storage accounts. It allows you to upload, download, and manage blobs, files,
queues, tables, and Azure Cosmos DB entities. Use it with Blob storage to manage blobs and folders, as
well as upload and download blobs between your local file system and Blob storage, or between storage
accounts.
Azure por tal . Both Blob storage and Data Lake Store provide a web-based interface for exploring files
and uploading new files one at a time. This is a good option if you do not want to install any tools or issue
commands to quickly explore your files, or to simply upload a handful of new ones.

Data pipeline
Azure Data Factor y . Azure Data Factory is a managed service best suited for regularly transferring files
between a number of Azure services, on-premises, or a combination of the two. Using Azure Data Factory, you
can create and schedule data-driven workflows (called pipelines) that ingest data from disparate data stores. It
can process and transform the data by using compute services such as Azure HDInsight Hadoop, Spark, Azure
Data Lake Analytics, and Azure Machine Learning. Create data-driven workflows for orchestrating and
automating data movement and data transformation.

Key Selection Criteria


For data transfer scenarios, choose the appropriate system for your needs by answering these questions:
Do you need to transfer very large amounts of data, where doing so over an Internet connection would
take too long, be unreliable, or too expensive? If yes, consider physical transfer.
Do you prefer to script your data transfer tasks, so they are reusable? If so, select one of the command
line options or Azure Data Factory.
Do you need to transfer a very large amount of data over a network connection? If so, select an option
that is optimized for big data.
Do you need to transfer data to or from a relational database? If yes, choose an option that supports one
or more relational databases. Note that some of these options also require a Hadoop cluster.
Do you need an automated data pipeline or workflow orchestration? If yes, consider Azure Data Factory.

Capability matrix
The following tables summarize the key differences in capabilities.
Physical transfer
C A PA B IL IT Y A Z URE IM P O RT / EXP O RT SERVIC E A Z URE DATA B O X

Form factor Internal SATA HDDs or SDDs Secure, tamper-proof, single hardware
appliance

Microsoft manages shipping logistics No Yes

Integrates with partner products No Yes

Custom appliance No Yes

Command line tools


Hadoop/HDInsight:

C A PA B IL IT Y DISTC P SQ O O P H A DO O P C L I

Optimized for big data Yes Yes Yes

Copy to relational database No Yes No

Copy from relational No Yes No


database

Copy to Blob storage Yes Yes Yes

Copy from Blob storage Yes Yes No

Copy to Data Lake Store Yes Yes Yes

Copy from Data Lake Store Yes Yes No

Other :

C A PA B IL IT Y A Z URE C L I AZC OPY P O W ERSH EL L A DL C O P Y P O LY B A SE

Compatible Linux, OS X, Linux, Windows Windows Linux, OS X, SQL Server,


platforms Windows Windows Azure Synapse

Optimized for No Yes No Yes 1 Yes 2


big data

Copy to No No No No Yes
relational
database
C A PA B IL IT Y A Z URE C L I AZC OPY P O W ERSH EL L A DL C O P Y P O LY B A SE

Copy from No No No No Yes


relational
database

Copy to Blob Yes Yes Yes No Yes


storage

Copy from Blob Yes Yes Yes Yes Yes


storage

Copy to Data No Yes Yes Yes Yes


Lake Store

Copy from Data No No Yes Yes Yes


Lake Store

[1] AdlCopy is optimized for transferring big data when used with a Data Lake Analytics account.
[2] PolyBase performance can be increased by pushing computation to Hadoop and using PolyBase scale-out
groups to enable parallel data transfer between SQL Server instances and Hadoop nodes.
Graphical interface and Azure Data Factory
C A PA B IL IT Y A Z URE STO RA GE EXP LO RER A Z URE P O RTA L * A Z URE DATA FA C TO RY

Optimized for big data No No Yes

Copy to relational database No No Yes

Copy from relational No No Yes


database

Copy to Blob storage Yes No Yes

Copy from Blob storage Yes No Yes

Copy to Data Lake Store No No Yes

Copy from Data Lake Store No No Yes

Upload to Blob storage Yes Yes Yes

Upload to Data Lake Store Yes Yes Yes

Orchestrate data transfers No No Yes

Custom data No No Yes


transformations

Pricing model Free Free Pay per usage

* Azure portal in this case means using the web-based exploration tools for Blob storage and Data Lake Store.
Choose an analytical data store in Azure
3/10/2022 • 5 minutes to read • Edit Online

In a big data architecture, there is often a need for an analytical data store that serves processed data in a
structured format that can be queried using analytical tools. Analytical data stores that support querying of both
hot-path and cold-path data are collectively referred to as the serving layer, or data serving storage.
The serving layer deals with processed data from both the hot path and cold path. In the lambda architecture,
the serving layer is subdivided into a speed serving layer, which stores data that has been processed
incrementally, and a batch serving layer, which contains the batch-processed output. The serving layer requires
strong support for random reads with low latency. Data storage for the speed layer should also support random
writes, because batch loading data into this store would introduce undesired delays. On the other hand, data
storage for the batch layer does not need to support random writes, but batch writes instead.
There is no single best data management choice for all data storage tasks. Different data management solutions
are optimized for different tasks. Most real-world cloud apps and big data processes have a variety of data
storage requirements and often use a combination of data storage solutions.

What are your options when choosing an analytical data store?


There are several options for data serving storage in Azure, depending on your needs:
Azure Synapse Analytics
Azure Synapse Spark pools
Azure Databricks
Azure Data Explorer
Azure SQL Database
SQL Server in Azure VM
HBase/Phoenix on HDInsight
Hive LLAP on HDInsight
Azure Analysis Services
Azure Cosmos DB
These options provide various database models that are optimized for different types of tasks:
Key/value databases hold a single serialized object for each key value. They're good for storing large volumes
of data where you want to get one item for a given key value and you don't have to query based on other
properties of the item.
Document databases are key/value databases in which the values are documents. A "document" in this
context is a collection of named fields and values. The database typically stores the data in a format such as
XML, YAML, JSON, or BSON, but may use plain text. Document databases can query on non-key fields and
define secondary indexes to make querying more efficient. This makes a document database more suitable
for applications that need to retrieve data based on criteria more complex than the value of the document
key. For example, you could query on fields such as product ID, customer ID, or customer name.
Column-family databases are key/value data stores that structure data storage into collections of related
columns called column families. For example, a census database might have one group of columns for a
person's name (first, middle, last), one group for the person's address, and one group for the person's profile
information (data of birth, gender). The database can store each column family in a separate partition, while
keeping all of the data for one person related to the same key. An application can read a single column family
without reading through all of the data for an entity.
Graph databases store information as a collection of objects and relationships. A graph database can
efficiently perform queries that traverse the network of objects and the relationships between them. For
example, the objects might be employees in a human resources database, and you might want to facilitate
queries such as "find all employees who directly or indirectly work for Scott."
Telemetry and time series databases are an append-only collection of objects. Telemetry databases efficiently
index data in a variety of column stores and in-memory structures, making them the optimal choice for
storing and analyzing vast quantities of telemetry and time series data.

Key selection criteria


To narrow the choices, start by answering these questions:
Do you need serving storage that can serve as a hot path for your data? If yes, narrow your options to
those that are optimized for a speed serving layer.
Do you need massively parallel processing (MPP) support, where queries are automatically distributed
across several processes or nodes? If yes, select an option that supports query scale out.
Do you prefer to use a relational data store? If so, narrow your options to those with a relational database
model. However, note that some non-relational stores support SQL syntax for querying, and tools such as
PolyBase can be used to query non-relational data stores.
Do you collect time series data? Do you use append-only data?

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
H B A SE/ P
A Z URE A Z URE A Z URE H O EN IX H IVE
SQ L SY N A P SE SY N A P SE DATA ON LLAP ON A Z URE
C A PA B IL I DATA B A S SQ L SPA RK EXP LO RE H DIN SIG H DIN SIG A N A LY SIS C O SM O S
TY E POOL POOL R HT HT SERVIC ES DB

Is Yes Yes Yes Yes Yes 1 Yes 1 Yes Yes


managed
service

Primary Relational Relational Wide Relational Wide Hive/In- Tabular Documen


database (columnar tables column (column column Memory semantic t store,
model format with store store), store models graph,
when columnar telemetry, key-value
using storage and time store,
columnst series wide
ore store column
indexes) store

SQL Yes Yes Yes Yes Yes (using Yes No Yes


language Phoenix
support JDBC
driver)
H B A SE/ P
A Z URE A Z URE A Z URE H O EN IX H IVE
SQ L SY N A P SE SY N A P SE DATA ON LLAP ON A Z URE
C A PA B IL I DATA B A S SQ L SPA RK EXP LO RE H DIN SIG H DIN SIG A N A LY SIS C O SM O S
TY E POOL POOL R HT HT SERVIC ES DB

Optimize Yes 2 Yes 3 Yes Yes Yes Yes No Yes


d for
speed
serving
layer

[1] With manual configuration and scaling.


[2] Using memory-optimized tables and hash or nonclustered indexes.
[3] Supported as an Azure Stream Analytics output.
Scalability capabilities
H B A SE/ P
A Z URE A Z URE A Z URE H O EN IX H IVE
SQ L SY N A P SE SY N A P SE DATA ON LLAP ON A Z URE
C A PA B IL I DATA B A S SQ L SPA RK EXP LO RE H DIN SIG H DIN SIG A N A LY SIS C O SM O S
TY E POOL POOL R HT HT SERVIC ES DB

Redunda Yes No No Yes Yes No No Yes


nt
regional
servers
for high
availabilit
y

Supports No Yes Yes Yes Yes Yes Yes Yes


query
scale out

Dynamic Yes Yes Yes Yes No No Yes Yes


scalability
(scale up)

Supports Yes Yes Yes Yes No Yes Yes No


in-
memory
caching
of data

Security capabilities
A Z URE H B A SE/ P H H IVE L L A P A Z URE
C A PA B IL IT SQ L A Z URE DATA O EN IX O N ON A N A LY SIS C O SM O S
Y DATA B A SE SY N A P SE EXP LO RER H DIN SIGH T H DIN SIGH T SERVIC ES DB

Authenticat SQL / SQL / Azure AD local / local / Azure AD database


ion Azure Azure AD Azure AD 1 Azure AD 1 users /
Active Azure AD
Directory via access
(Azure AD) control
(IAM)
A Z URE H B A SE/ P H H IVE L L A P A Z URE
C A PA B IL IT SQ L A Z URE DATA O EN IX O N ON A N A LY SIS C O SM O S
Y DATA B A SE SY N A P SE EXP LO RER H DIN SIGH T H DIN SIGH T SERVIC ES DB

Data Yes 2 Yes 2 Yes Yes 1 Yes 1 Yes Yes


encryption
at rest

Row-level Yes Yes 3 No Yes 1 Yes 1 Yes No


security

Supports Yes Yes Yes Yes 4 Yes 4 Yes Yes


firewalls

Dynamic Yes Yes Yes Yes 1 Yes No No


data
masking

[1] Requires using a domain-joined HDInsight cluster.


[2] Requires using transparent data encryption (TDE) to encrypt and decrypt your data at rest.
[3] Filter predicates only. See Row-Level Security
[4] When used within an Azure Virtual Network. See Extend Azure HDInsight using an Azure Virtual Network.
Choose a data analytics technology in Azure
3/10/2022 • 4 minutes to read • Edit Online

The goal of most big data solutions is to provide insights into the data through analysis and reporting. This can
include preconfigured reports and visualizations, or interactive data exploration.

What are your options when choosing a data analytics technology?


There are several options for analysis, visualizations, and reporting in Azure, depending on your needs:
Power BI
Jupyter Notebooks
Zeppelin Notebooks
Microsoft Azure Notebooks
Power BI
Power BI is a suite of business analytics tools. It can connect to hundreds of data sources, and can be used for ad
hoc analysis. See this list of the currently available data sources. Use Power BI Embedded to integrate Power BI
within your own applications without requiring any additional licensing.
Organizations can use Power BI to produce reports and publish them to the organization. Everyone can create
personalized dashboards, with governance and security built in. Power BI uses Azure Active Directory (Azure AD)
to authenticate users who log in to the Power BI service, and uses the Power BI login credentials whenever a
user attempts to access resources that require authentication.
Jupyter Notebooks
Jupyter Notebooks provide a browser-based shell that lets data scientists create notebook files that contain
Python, Scala, or R code and markdown text, making it an effective way to collaborate by sharing and
documenting code and results in a single document.
Most varieties of HDInsight clusters, such as Spark or Hadoop, come preconfigured with Jupyter notebooks for
interacting with data and submitting jobs for processing. Depending on the type of HDInsight cluster you are
using, one or more kernels will be provided for interpreting and running your code. For example, Spark clusters
on HDInsight provide Spark-related kernels that you can select from to execute Python or Scala code using the
Spark engine.
Jupyter notebooks provide a great environment for analyzing, visualizing, and processing your data prior to
building more advanced visualizations with a BI/reporting tool like Power BI.
Zeppelin Notebooks
Zeppelin Notebooks are another option for a browser-based shell, similar to Jupyter in functionality. Some
HDInsight clusters come preconfigured with Zeppelin notebooks. However, if you are using an HDInsight
Interactive Query (Hive LLAP) cluster, Zeppelin is currently your only choice of notebook that you can use to run
interactive Hive queries. Also, if you are using a domain-joined HDInsight cluster, Zeppelin notebooks are the
only type that enables you to assign different user logins to control access to notebooks and the underlying Hive
tables.
Microsoft Azure Notebooks
Azure Notebooks is an online Jupyter Notebooks-based service that enables data scientists to create, run, and
share Jupyter Notebooks in cloud-based libraries. Azure Notebooks provides execution environments for
Python 2, Python 3, F#, and R, and provides several charting libraries for visualizing your data, such as ggplot,
matplotlib, bokeh, and seaborn.
Unlike Jupyter notebooks running on an HDInsight cluster, which are connected to the cluster's default storage
account, Azure Notebooks does not provide any data. You must load data in a variety of ways, such downloading
data from an online source, interacting with Azure Blobs or Table Storage, connecting to a SQL database, or
loading data with the Copy Wizard for Azure Data Factory.
Key benefits:
Free service—no Azure subscription required.
No need to install Jupyter and the supporting R or Python distributions locally—just use a browser.
Manage your own online libraries and access them from any device.
Share your notebooks with collaborators.
Considerations:
You will be unable to access your notebooks when offline.
Limited processing capabilities of the free notebook service may not be enough to train large or complex
models.

Key selection criteria


To narrow the choices, start by answering these questions:
Do you need to connect to numerous data sources, providing a centralized place to create reports for
data spread throughout your domain? If so, choose an option that allows you to connect to 100s of data
sources.
Do you want to embed dynamic visualizations in an external website or application? If so, choose an
option that provides embedding capabilities.
Do you want to design your visualizations and reports while offline? If yes, choose an option with offline
capabilities.
Do you need heavy processing power to train large or complex AI models or work with very large data
sets? If yes, choose an option that can connect to a big data cluster.

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
JUP Y T ER Z EP P EL IN M IC RO SO F T A Z URE
C A PA B IL IT Y P O W ER B I N OT EB O O K S N OT EB O O K S N OT EB O O K S

Connect to big data Yes Yes Yes No


cluster for advanced
processing

Managed service Yes Yes 1 Yes 1 Yes

Connect to 100s of Yes No No No


data sources

Offline capabilities Yes 2 No No No


JUP Y T ER Z EP P EL IN M IC RO SO F T A Z URE
C A PA B IL IT Y P O W ER B I N OT EB O O K S N OT EB O O K S N OT EB O O K S

Embedding Yes No No No
capabilities

Automatic data Yes No No No


refresh

Access to numerous No Yes 3 Yes 3 Yes 4


open source
packages

Data Power Query, R 40 languages, 20+ interpreters, Python, F#, R


transformation/clean including Python, R, including Python,
sing options Julia, and Scala JDBC, and R

Pricing Free for Power BI Free Free Free


Desktop (authoring),
see pricing for
hosting options

Multiuser Yes Yes (through sharing Yes Yes (through sharing)


collaboration or with a multiuser
server like
JupyterHub)

[1] When used as part of a managed HDInsight cluster.


[2] With the use of Power BI Desktop.
[2] You can search the Maven repository for community-contributed packages.
[3] Python packages can be installed using either pip or conda. R packages can be installed from CRAN or
GitHub. Packages in F# can be installed via nuget.org using the Paket dependency manager.
Choose a batch processing technology in Azure
3/10/2022 • 3 minutes to read • Edit Online

Big data solutions often use long-running batch jobs to filter, aggregate, and otherwise prepare the data for
analysis. Usually these jobs involve reading source files from scalable storage (like HDFS, Azure Data Lake Store,
and Azure Storage), processing them, and writing the output to new files in scalable storage.
The key requirement of such batch processing engines is the ability to scale out computations, in order to
handle a large volume of data. Unlike real-time processing, however, batch processing is expected to have
latencies (the time between data ingestion and computing a result) that measure in minutes to hours.

Technology choices for batch processing


Azure Synapse Analytics
Azure Synapse is a distributed system designed to perform analytics on large data. It supports massive parallel
processing (MPP), which makes it suitable for running high-performance analytics. Consider Azure Synapse
when you have large amounts of data (more than 1 TB) and are running an analytics workload that will benefit
from parallelism.
Azure Data Lake Analytics
Data Lake Analytics is an on-demand analytics job service. It is optimized for distributed processing of very
large data sets stored in Azure Data Lake Store.
Languages: U-SQL (including Python, R, and C# extensions).
Integrates with Azure Data Lake Store, Azure Storage blobs, Azure SQL Database, and Azure Synapse.
Pricing model is per-job.
HDInsight
HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch
processing, you can use Spark, Hive, Hive LLAP, MapReduce.
Languages: R, Python, Java, Scala, SQL
Kerberos authentication with Active Directory, Apache Ranger based access control
Gives you full control of the Hadoop cluster
Azure Databricks
Azure Databricks is an Apache Spark-based analytics platform. You can think of it as "Spark as a service." It's the
easiest way to use Spark on the Azure platform.
Languages: R, Python, Java, Scala, Spark SQL
Fast cluster start times, autotermination, autoscaling.
Manages the Spark cluster for you.
Built-in integration with Azure Blob Storage, Azure Data Lake Storage (ADLS), Azure Synapse, and other
services. See Data Sources.
User authentication with Azure Active Directory.
Web-based notebooks for collaboration and data exploration.
Supports GPU-enabled clusters
Azure Distributed Data Engineering Toolkit
The Distributed Data Engineering Toolkit (AZTK) is a tool for provisioning on-demand Spark on Docker clusters
in Azure.
AZTK is not an Azure service. Rather, it's a client-side tool with a CLI and Python SDK interface, that's built on
Azure Batch. This option gives you the most control over the infrastructure when deploying a Spark cluster.
Bring your own Docker image.
Use low-priority VMs for an 80% discount.
Mixed mode clusters that use both low-priority and dedicated VMs.
Built in support for Azure Blob Storage and Azure Data Lake connection.

Key selection criteria


To narrow the choices, start by answering these questions:
Do you want a managed service rather than managing your own servers?
Do you want to author batch processing logic declaratively or imperatively?
Will you perform batch processing in bursts? If yes, consider options that let you auto-terminate the
cluster or whose pricing model is per batch job.
Do you need to query relational data stores along with your batch processing, for example to look up
reference data? If yes, consider the options that enable querying of external relational stores.

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
A Z URE DATA L A K E
C A PA B IL IT Y A N A LY T IC S A Z URE SY N A P SE H DIN SIGH T A Z URE DATA B RIC K S

Is managed service Yes Yes Yes 1 Yes

Relational data store Yes Yes No No

Pricing model Per batch job By cluster hour By cluster hour Databricks Unit2 +
cluster hour

[1] With manual configuration.


[2] A Databricks Unit (DBU) is a unit of processing capability per hour.
Capabilities
A Z URE DATA H DIN SIGH T
LAKE A Z URE H DIN SIGH T H DIN SIGH T W IT H H IVE A Z URE
C A PA B IL IT Y A N A LY T IC S SY N A P SE W IT H SPA RK W IT H H IVE LLAP DATA B RIC K S

Autoscaling No No Yes Yes Yes Yes

Scale-out Per job Per cluster Per cluster Per cluster Per cluster Per cluster
granularity

In-memory No Yes Yes No Yes Yes


caching of
data
A Z URE DATA H DIN SIGH T
LAKE A Z URE H DIN SIGH T H DIN SIGH T W IT H H IVE A Z URE
C A PA B IL IT Y A N A LY T IC S SY N A P SE W IT H SPA RK W IT H H IVE LLAP DATA B RIC K S

Query from Yes No Yes No No Yes


external
relational
stores

Authenticatio Azure AD SQL / Azure No Azure AD1 Azure AD1 Azure AD


n AD

Auditing Yes Yes No Yes 1 Yes 1 Yes

Row-level No Yes2 No Yes 1 Yes 1 No


security

Supports Yes Yes Yes Yes 3 Yes 3 No


firewalls

Dynamic data No Yes No Yes 1 Yes 1 No


masking

[1] Requires using a domain-joined HDInsight cluster.


[2] Filter predicates only. See Row-Level Security
[3] Supported when used within an Azure Virtual Network.

Next steps
Analytics architecture design
Choose an analytical data store in Azure
Choose a data analytics technology in Azure
Analytics end-to-end with Azure Synapse
Choose a stream processing technology in Azure
3/10/2022 • 2 minutes to read • Edit Online

This article compares technology choices for real-time stream processing in Azure.
Real-time stream processing consumes messages from either queue or file-based storage, processes the
messages, and forwards the result to another message queue, file store, or database. Processing may include
querying, filtering, and aggregating messages. Stream processing engines must be able to consume endless
streams of data and produce results with minimal latency. For more information, see Real time processing.

What are your options when choosing a technology for real-time


processing?
In Azure, all of the following data stores will meet the core requirements supporting real-time processing:
Azure Stream Analytics
HDInsight with Spark Streaming
Apache Spark in Azure Databricks
HDInsight with Storm
Azure Functions
Azure App Service WebJobs
Apache Kafka streams API

Key Selection Criteria


For real-time processing scenarios, begin choosing the appropriate service for your needs by answering these
questions:
Do you prefer a declarative or imperative approach to authoring stream processing logic?
Do you need built-in support for temporal processing or windowing?
Does your data arrive in formats besides Avro, JSON, or CSV? If yes, consider options that support any
format using custom code.
Do you need to scale your processing beyond 1 GB/s? If yes, consider the options that scale with the
cluster size.

Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S

Programmabil Stream C#/F#, Java, C#/F#, Java, C#, Java C#, F#, Java, C#, Java,
ity analytics Python, Scala Python, R, Node.js, Node.js, PHP,
query Scala Python Python
language,
JavaScript

Programming Declarative Mixture of Mixture of Imperative Imperative Imperative


paradigm declarative declarative
and and
imperative imperative

Pricing model Streaming Per cluster Databricks Per cluster Per function Per app
units hour units hour execution and service plan
resource hour
consumption

Integration capabilities
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S

Inputs Azure Event Event Hubs, Event Hubs, Event Hubs, Supported Service Bus,
Hubs, Azure IoT Hub, IoT Hub, IoT Hub, bindings Storage
IoT Hub, Kafka, HDFS, Kafka, HDFS, Storage Blobs, Queues,
Azure Blob Storage Blobs, Storage Blobs, Azure Data Storage Blobs,
storage Azure Data Azure Data Lake Store Event Hubs,
Lake Store Lake Store WebHooks,
Cosmos DB,
Files

Sinks Azure Data HDFS, Kafka, HDFS, Kafka, Event Hubs, Supported Service Bus,
Lake Store, Storage Blobs, Storage Blobs, Service Bus, bindings Storage
Azure SQL Azure Data Azure Data Kafka Queues,
Database, Lake Store, Lake Store, Storage Blobs,
Storage Blobs, Cosmos DB Cosmos DB Event Hubs,
Event Hubs, WebHooks,
Power BI, Cosmos DB,
Table Storage, Files
Service Bus
Queues,
Service Bus
Topics,
Cosmos DB,
Azure
Functions

Processing capabilities
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S

Built-in Yes Yes Yes Yes No No


temporal/win
dowing
support

Input data Avro, JSON or Any format Any format Any format Any format Any format
formats CSV, UTF-8 using custom using custom using custom using custom using custom
encoded code code code code code

Scalability Query Bounded by Bounded by Bounded by Up to 200 Bounded by


partitions cluster size Databricks cluster size function app app service
cluster scale instances plan capacity
configuration processing in
parallel

Late arrival Yes Yes Yes Yes No No


and out of
order event
handling
support

See also:
Choosing a real-time message ingestion technology
Real time processing
Choose a Microsoft cognitive services technology
3/10/2022 • 3 minutes to read • Edit Online

Microsoft cognitive services are cloud-based APIs that you can use in artificial intelligence (AI) applications and
data flows. They provide you with pretrained models that are ready to use in your application, requiring no data
and no model training on your part. The cognitive services are developed by Microsoft's AI and Research team
and leverage the latest deep learning algorithms. They are consumed over HTTP REST interfaces. In addition,
SDKs are available for many common application development frameworks.
The cognitive services include:
Text analysis
Computer vision
Video analytics
Speech recognition and generation
Natural language understanding
Intelligent search
Key benefits:
Minimal development effort for state-of-the-art AI services.
Easy integration into apps via HTTP REST interfaces.
Built-in support for consuming cognitive services in Azure Data Lake Analytics.
Considerations:
Only available over the web. Internet connectivity is generally required. An exception is the Custom Vision
Service, whose trained model you can export for prediction on devices and at the IoT edge.
Although considerable customization is supported, the available services may not suit all predictive
analytics requirements.

What are your options when choosing amongst the cognitive


services?
In Azure, there are dozens of Cognitive Services available. The current listing of these is available in a directory
categorized by the functional area they support:
Vision
Speech
Decision
Search
Language

Key selection criteria


To narrow the choices, start by answering these questions:
What type of data are you dealing with? Narrow your options based on the type of input data you are
working with. For example, if your input is text, select from the services that have an input type of text.
Do you have the data to train a model? If yes, consider the custom services that enable you to train their
underlying models with data that you provide, for improved accuracy and performance.

Capability matrix
The following tables summarize the key differences in capabilities.
Uses prebuilt models
C A PA B IL IT Y IN P UT T Y P E K EY B EN EF IT

Text Analytics API Text Evaluate sentiment and topics to


understand what users want.

Entity Linking API Text Power your app's data links with
named entity recognition and
disambiguation.

Language Understanding Intelligent Text Teach your apps to understand


Service (LUIS) commands from your users.

QnA Maker Service Text Distill FAQ formatted information into


conversational, easy-to-navigate
answers.

Linguistic Analysis API Text Simplify complex language concepts


and parse text.

Knowledge Exploration Service Text Enable interactive search experiences


over structured data via natural
language inputs.

Web Language Model API Text Use predictive language models


trained on web-scale data.

Academic Knowledge API Text Tap into the wealth of academic


content in the Microsoft Academic
Graph populated by Bing.

Bing Autosuggest API Text Give your app intelligent autosuggest


options for searches.

Bing Spell Check API Text Detect and correct spelling mistakes in
your app.

Translator Text API Text Machine translation.

Recommendations API Text Predict and recommend items your


customers want.

Bing Entity Search API Text (web search query) Identify and augment entity
information from the web.

Bing Image Search API Text (web search query) Search for images.

Bing News Search API Text (web search query) Search for news.
C A PA B IL IT Y IN P UT T Y P E K EY B EN EF IT

Bing Video Search API Text (web search query) Search for videos.

Bing Web Search API Text (web search query) Get enhanced search details from
billions of web documents.

Bing Speech API Text or Speech Convert speech to text and back again.

Speaker Recognition API Speech Use speech to identify and


authenticate individual speakers.

Translator Speech API Speech Perform real-time speech translation.

Computer Vision API Images (or frames from video) Distill actionable information from
images, automatically create
description of photos, derive tags,
recognize celebrities, extract text, and
create accurate thumbnails.

Content Moderator Text, Images or Video Automated image, text, and video
moderation.

Emotion API Images (photos with human subjects) Identify the range emotions of human
subjects.

Face API Images (photos with human subjects) Detect, identify, analyze, organize, and
tag faces in photos.

Video Indexer Video Video insights such as sentiment,


transcript speech, translate speech,
recognize faces and emotions, and
extract keywords.

Trained with custom data you provide


C A PA B IL IT Y IN P UT T Y P E K EY B EN EF IT

Custom Vision Service Images (or frames from video) Customize your own computer vision
models.

Custom Speech Service Speech Overcome speech recognition barriers


like speaking style, background noise,
and vocabulary.

Custom Decision Service Web content (for example, RSS feed) Use machine learning to automatically
select the appropriate content for your
home page

Bing Custom Search API Text (web search query) Commercial-grade search tool.
Compare the machine learning products and
technologies from Microsoft
3/10/2022 • 8 minutes to read • Edit Online

Learn about the machine learning products and technologies from Microsoft. Compare options to help you
choose how to most effectively build, deploy, and manage your machine learning solutions.

Cloud-based machine learning products


The following options are available for machine learning in the Azure cloud.

C LO UD O P T IO N S W H AT IT IS W H AT Y O U C A N DO W IT H IT

Azure Machine Learning Managed platform for machine Use a pretrained model. Or, train,
learning deploy, and manage models on Azure
using Python and CLI

Azure Cognitive Services Pre-built AI capabilities implemented Build intelligent applications quickly
through REST APIs and SDKs using standard programming
languages. Doesn't require machine
learning and data science expertise

Azure SQL Managed Instance Machine In-database machine learning for SQL Train and deploy models inside Azure
Learning Services SQL Managed Instance

Machine learning in Azure Synapse Analytics service with machine learning Train and deploy models inside Azure
Analytics Synapse Analytics

Machine learning and AI with ONNX in Machine learning in SQL on IoT Train and deploy models inside Azure
Azure SQL Edge SQL Edge

Azure Databricks Apache Spark-based analytics platform Build and deploy models and data
workflows using integrations with
open-source machine learning libraries
and the MLFlow platform.

On-premises machine learning products


The following options are available for machine learning on-premises. On-premises servers can also run in a
virtual machine in the cloud.

O N - P REM ISES O P T IO N S W H AT IT IS W H AT Y O U C A N DO W IT H IT

SQL Server Machine Learning Services In-database machine learning for SQL Train and deploy models inside SQL
Server

Machine Learning Services on SQL Machine learning in Big Data Clusters Train and deploy models on SQL
Server Big Data Clusters Server Big Data Clusters

Development platforms and tools


The following development platforms and tools are available for machine learning.

P L AT F O RM S/ TO O L S W H AT IT IS W H AT Y O U C A N DO W IT H IT

Azure Data Science Virtual Machine Virtual machine with pre-installed data Develop machine learning solutions in
science tools a pre-configured environment

ML.NET Open-source, cross-platform machine Develop machine learning solutions for


learning SDK .NET applications

Windows ML Windows 10 machine learning Evaluate trained models on a Windows


platform 10 device

MMLSpark Open-source, distributed, machine Create and deploy scalable machine


learning and microservices framework learning applications for Scala and
for Apache Spark Python.

Machine Learning extension for Azure Open-source and cross-platform Manage packages, import machine
Data Studio machine learning extension for Azure learning models, make predictions, and
Data Studio create notebooks to run experiments
for your SQL databases

Azure Machine Learning


Azure Machine Learning is a fully managed cloud service used to train, deploy, and manage machine learning
models at scale. It fully supports open-source technologies, so you can use tens of thousands of open-source
Python packages such as TensorFlow, PyTorch, and scikit-learn. Rich tools are also available, such as Compute
instances, Jupyter notebooks, or the Azure Machine Learning for Visual Studio Code extension, a free extension
that allows you to manage your resources, model training workflows and deployments in Visual Studio Code.
Azure Machine Learning includes features that automate model generation and tuning with ease, efficiency, and
accuracy.
Use Python SDK, Jupyter notebooks, R, and the CLI for machine learning at cloud scale. For a low-code or no-
code option, use Azure Machine Learning's interactive designer in the studio to easily and quickly build, test, and
deploy models using pre-built machine learning algorithms.
Try Azure Machine Learning for free.

Type Cloud-based machine learning solution

Suppor ted languages Python, R

Machine learning phases Model training


Deployment
MLOps/Management

Key benefits Code first (SDK) and studio & drag-and-drop designer web
interface authoring options.

Central management of scripts and run history, making it


easy to compare model versions.

Easy deployment and management of models to the cloud


or edge devices.
Considerations Requires some familiarity with the model management
model.

Azure Cognitive Services


Azure Cognitive Services is a set of pre-built APIs that enable you to build apps that use natural methods of
communication. The term pre-built suggests that you do not need to bring datasets or data science expertise to
train models to use in your applications. That's all done for you and packaged as APIs and SDKs that allow your
apps to see, hear, speak, understand, and interpret user needs with just a few lines of code. You can easily add
intelligent features to your apps, such as:
Vision: Object detection, face recognition, OCR, etc. See Computer Vision, Face, Form Recognizer.
Speech: Speech-to-text, text-to-speech, speaker recognition, etc. See Speech Service.
Language: Translation, Sentiment analysis, key phrase extraction, language understanding, etc. See
Translator, Text Analytics, Language Understanding, QnA Maker
Decision: Anomaly detection, content moderation, reinforcement learning. See Anomaly Detector, Content
Moderator, Personalizer.
Use Cognitive Services to develop apps across devices and platforms. The APIs keep improving, and are easy to
set up.

Type APIs for building intelligent applications

Suppor ted languages Various options depending on the service. Standard ones are
C#, Java, JavaScript, and Python.

Machine learning phases Deployment

Key benefits Build intelligent applications using pre-trained models


available through REST API and SDK.
Variety of models for natural communication methods with
vision, speech, language, and decision.
No machine learning or data science expertise required.

SQL machine learning


SQL machine learning adds statistical analysis, data visualization, and predictive analytics in Python and R for
relational data, both on-premises and in the cloud. Current platforms and tools include:
SQL Server Machine Learning Services
Machine Learning Services on SQL Server Big Data Clusters
Azure SQL Managed Instance Machine Learning Services
Machine learning in Azure Synapse Analytics
Machine learning and AI with ONNX in Azure SQL Edge
Machine Learning extension for Azure Data Studio
Use SQL machine learning when you need built-in AI and predictive analytics on relational data in SQL.

Type On-premises predictive analytics for relational data


Suppor ted languages Python, R, SQL

Machine learning phases Data preparation


Model training
Deployment

Key benefits Encapsulate predictive logic in a database function, making it


easy to include in data-tier logic.

Considerations Assumes a SQL database as the data tier for your


application.

Azure Data Science Virtual Machine


The Azure Data Science Virtual Machine is a customized virtual machine environment on the Microsoft Azure
cloud. It is available in versions for both Windows and Linux Ubuntu. The environment is built specifically for
doing data science and developing ML solutions. It has many popular data science, ML frameworks, and other
tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics.
Use the Data Science VM when you need to run or host your jobs on a single node. Or if you need to remotely
scale up your processing on a single machine.

Type Customized virtual machine environment for data science

Key benefits Reduced time to install, manage, and troubleshoot data


science tools and frameworks.

The latest versions of all commonly used tools and


frameworks are included.

Virtual machine options include highly scalable images with


GPU capabilities for intensive data modeling.

Considerations The virtual machine cannot be accessed when offline.

Running a virtual machine incurs Azure charges, so you


must be careful to have it running only when required.

Azure Databricks
Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services
platform. Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an
interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Use Python, R, Scala, and SQL code in web-based notebooks to query, visualize, and model data.
Use Databricks when you want to collaborate on building machine learning solutions on Apache Spark.

Type Apache Spark-based analytics platform

Suppor ted languages Python, R, Scala, SQL


Machine learning phases Data preparation
Data preprocessing
Model training
Model tuning
Model inference
Management
Deployment

ML.NET
ML.NET is an open-source, and cross-platform machine learning framework. With ML.NET, you can build custom
machine learning solutions and integrate them into your .NET applications. ML.NET offers varying levels of
interoperability with popular frameworks like TensorFlow and ONNX for training and scoring machine learning
and deep learning models. For resource-intensive tasks like training image classification models, you can take
advantage of Azure to train your models in the cloud.
Use ML.NET when you want to integrate machine learning solutions into your .NET applications. Choose
between the API for a code-first experience and Model Builder or the CLI for a low-code experience.

Type Open-source cross-platform framework for developing


custom machine learning applications with .NET

Languages suppor ted C#, F#

Machine learning phases Data preparation


Training
Deployment

Key benefits Data science & ML experience not required


Use familiar tools (Visual Studio, VS Code) and languages
Deploy where .NET runs
Extensible
Scalable
Local-first experience

Windows ML
Windows ML inference engine allows you to use trained machine learning models in your applications,
evaluating trained models locally on Windows 10 devices.
Use Windows ML when you want to use trained machine learning models within your Windows applications.

Type Inference engine for trained models in Windows devices

Languages suppor ted C#/C++, JavaScript

MMLSpark
Microsoft ML for Apache Spark (MMLSpark) is an open-source library that expands the distributed computing
framework Apache Spark. MMLSpark adds many deep learning and data science tools to the Spark ecosystem,
including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK),
LightGBM, LIME (Model Interpretability), and OpenCV. You can use these tools to create powerful predictive
models on any Spark cluster, such as Azure Databricks or Cosmic Spark.
MMLSpark also brings new networking capabilities to the Spark ecosystem. With the HTTP on Spark project,
users can embed any web service into their SparkML models. Additionally, MMLSpark provides easy-to-use
tools for orchestrating Azure Cognitive Services at scale. For production-grade deployment, the Spark Serving
project enables high throughput, submillisecond latency web services, backed by your Spark cluster.

Type Open-source, distributed machine learning and


microservices framework for Apache Spark

Languages suppor ted Scala 2.11, Java, Python 3.5+, R (beta)

Machine learning phases Data preparation


Model training
Deployment

Key benefits Scalability


Streaming + Serving compatible
Fault-tolerance

Considerations Requires Apache Spark

Next steps
To learn about all the Artificial Intelligence (AI) development products available from Microsoft, see Microsoft
AI platform.
For training in developing AI and Machine Learning solutions with Microsoft, see Microsoft Learn.
Choose a real-time message ingestion technology
in Azure
3/10/2022 • 2 minutes to read • Edit Online

Real time processing deals with streams of data that are captured in real-time and processed with minimal
latency. Many real-time processing solutions need a message ingestion store to act as a buffer for messages,
and to support scale-out processing, reliable delivery, and other message queuing semantics.

What are your options for real-time message ingestion?


Azure Event Hubs
Azure IoT Hub
Kafka on HDInsight

Azure Event Hubs


Azure Event Hubs is a highly scalable data streaming platform and event ingestion service, capable of receiving
and processing millions of events per second. Event Hubs can process and store events, data, or telemetry
produced by distributed software and devices. Data sent to an event hub can be transformed and stored using
any real-time analytics provider or batching/storage adapters. Event Hubs provides publish-subscribe
capabilities with low latency at massive scale, which makes it appropriate for big data scenarios.

Azure IoT Hub


Azure IoT Hub is a managed service that enables reliable and secure bidirectional communications between
millions of IoT devices and a cloud-based back end.
Features of IoT Hub include:
Multiple options for device-to-cloud and cloud-to-device communication. These options include one-way
messaging, file transfer, and request-reply methods.
Message routing to other Azure services.
Queryable store for device metadata and synchronized state information.
Secure communications and access control using per-device security keys or X.509 certificates.
Monitoring of device connectivity and device identity management events.
In terms of message ingestion, IoT Hub is similar to Event Hubs. However, it was specifically designed for
managing IoT device connectivity, not just message ingestion. For more information, see Comparison of Azure
IoT Hub and Azure Event Hubs.

Kafka on HDInsight
Apache Kafka is an open-source distributed streaming platform that can be used to build real-time data
pipelines and streaming applications. Kafka also provides message broker functionality similar to a message
queue, where you can publish and subscribe to named data streams. It is horizontally scalable, fault-tolerant, and
extremely fast. Kafka on HDInsight provides a Kafka as a managed, highly scalable, and highly available service
in Azure.
Some common use cases for Kafka are:
Messaging . Because it supports the publish-subscribe message pattern, Kafka is often used as a message
broker.
Activity tracking . Because Kafka provides in-order logging of records, it can be used to track and re-create
activities, such as user actions on a web site.
Aggregation . Using stream processing, you can aggregate information from different streams to combine
and centralize the information into operational data.
Transformation . Using stream processing, you can combine and enrich data from multiple input topics into
one or more output topics.

Key selection criteria


To narrow the choices, start by answering these questions:
Do you need two-way communication between your IoT devices and Azure? If so, choose IoT Hub.
Do you need to manage access for individual devices and be able to revoke access to a specific device? If
yes, choose IoT Hub.

Capability matrix
The following tables summarize the key differences in capabilities.

C A PA B IL IT Y IOT H UB EVEN T H UB S K A F K A O N H DIN SIGH T

Cloud-to-device Yes No No
communications

Device-initiated file upload Yes No No

Device state information Device twins No No

Protocol support MQTT, AMQP, HTTPS 1 AMQP, HTTPS, Kafka Kafka Protocol
Protocol

Security Per-device identity; Shared access policies; Authentication using SASL;


revocable access control. limited revocation through pluggable authorization;
publisher policies. integration with external
authentication services
supported.

[1] You can also use Azure IoT protocol gateway as a custom gateway to enable protocol adaptation for IoT Hub.
For more information, see Comparison of Azure IoT Hub and Azure Event Hubs.
Best practices in cloud applications
3/10/2022 • 3 minutes to read • Edit Online

These best practices can help you build reliable, scalable, and secure applications in the cloud. They offer
guidelines and tips for designing and implementing efficient and robust systems, mechanisms, and approaches.
Many also include code examples that you can use with Azure services. The practices apply to any distributed
system, whether your host is Azure or a different cloud platform.

Catalog of practices
This table lists various best practices. The Related pillars or patterns column contains the following links:
Cloud development challenges that the practice and related design patterns address.
Pillars of the Microsoft Azure Well-Architected Framework that the practice focuses on.

P RA C T IC E SUM M A RY REL AT ED P IL L A RS O R PAT T ERN S

API design Design web APIs to support platform Design and implementation,
independence by using standard Performance efficiency, Operational
protocols and agreed-upon data excellence
formats. Promote service evolution so
that clients can discover functionality
without requiring modification.
Improve response times and prevent
transient faults by supporting partial
responses and providing ways to filter
and paginate data.

API implementation Implement web APIs to be efficient, Design and implementation,


responsive, scalable, and available. Operational excellence
Make actions idempotent, support
content negotiation, and follow the
HTTP specification. Handle exceptions,
and support the discovery of
resources. Provide ways to handle
large requests and minimize network
traffic.

Autoscaling Design apps to dynamically allocate Performance efficiency, Cost


and de-allocate resources to satisfy optimization
performance requirements and
minimize costs. Take advantage of
Azure Monitor autoscale and the built-
in autoscaling that many Azure
components offer.

Background jobs Implement batch jobs, processing Design and implementation,


tasks, and workflows as background Operational excellence
jobs. Use Azure platform services to
host these tasks. Trigger tasks with
events or schedules, and return results
to calling tasks.
P RA C T IC E SUM M A RY REL AT ED P IL L A RS O R PAT T ERN S

Caching Improve performance by copying data Data management, Performance


to fast storage that's close to apps. efficiency
Cache data that you read often but
rarely modify. Manage data expiration
and concurrency. See how to populate
caches and use the Azure Cache for
Redis service.

Content delivery network Use content delivery networks (CDNs) Data management, Performance
to efficiently deliver web content to efficiency
users and reduce load on web apps.
Overcome deployment, versioning,
security, and resilience challenges.

Data partitioning Partition data to improve scalability, Data management, Performance


availability, and performance, and to efficiency, Cost optimization
reduce contention and data storage
costs. Use horizontal, vertical, and
functional partitioning in efficient ways.

Data partitioning strategies (by Partition data in Azure SQL Database Data management, Performance
service) and Azure Storage services like Azure efficiency, Cost optimization
Table Storage and Azure Blob Storage.
Shard your data to distribute loads,
reduce latency, and support horizontal
scaling.

Host name preservation Learn why it's important to preserve Design and implementation, Reliability
the original HTTP host name between
a reverse proxy and its back-end web
application, and how to implement this
recommendation for the most
common Azure services.

Message encoding considerations Use asynchronous messages to Messaging, Security


exchange information between system
components. Choose the payload
structure, encoding format, and
serialization library that work best with
your data.

Monitoring and diagnostics Track system health, usage, and Operational excellence
performance with a monitoring and
diagnostics pipeline. Turn monitoring
data into alerts, reports, and triggers
that help in various situations.
Examples include detecting and
correcting issues, spotting potential
problems, meeting performance
guarantees, and fulfilling auditing
requirements.

Retry guidance for specific services Use, adapt, and extend the retry Design and implementation, Reliability
mechanisms that Azure services and
client SDKs offer. Develop a systematic
and robust approach for managing
temporary issues with connections,
operations, and resources.
P RA C T IC E SUM M A RY REL AT ED P IL L A RS O R PAT T ERN S

Transient fault handling Handle transient faults caused by Design and implementation, Reliability
unavailable networks or resources.
Overcome challenges when developing
appropriate retry strategies. Avoid
duplicating layers of retry code and
other anti-patterns.

Next steps
Web API design
Web API implementation

Related resources
Cloud design patterns
Microsoft Azure Well-Architected Framework
RESTful web API design
3/10/2022 • 28 minutes to read • Edit Online

Most modern web applications expose APIs that clients can use to interact with the application. A well-designed
web API should aim to support:
Platform independence . Any client should be able to call the API, regardless of how the API is
implemented internally. This requires using standard protocols, and having a mechanism whereby the
client and the web service can agree on the format of the data to exchange.
Ser vice evolution . The web API should be able to evolve and add functionality independently from
client applications. As the API evolves, existing client applications should continue to function without
modification. All functionality should be discoverable so that client applications can fully use it.
This guidance describes issues that you should consider when designing a web API.

What is REST?
In 2000, Roy Fielding proposed Representational State Transfer (REST) as an architectural approach to designing
web services. REST is an architectural style for building distributed systems based on hypermedia. REST is
independent of any underlying protocol and is not necessarily tied to HTTP. However, most common REST API
implementations use HTTP as the application protocol, and this guide focuses on designing REST APIs for HTTP.
A primary advantage of REST over HTTP is that it uses open standards, and does not bind the implementation of
the API or the client applications to any specific implementation. For example, a REST web service could be
written in ASP.NET, and client applications can use any language or toolset that can generate HTTP requests and
parse HTTP responses.
Here are some of the main design principles of RESTful APIs using HTTP:
REST APIs are designed around resources, which are any kind of object, data, or service that can be
accessed by the client.
A resource has an identifier, which is a URI that uniquely identifies that resource. For example, the URI for
a particular customer order might be:

https://adventure-works.com/orders/1

Clients interact with a service by exchanging representations of resources. Many web APIs use JSON as
the exchange format. For example, a GET request to the URI listed above might return this response body:

{"orderId":1,"orderValue":99.90,"productId":1,"quantity":1}

REST APIs use a uniform interface, which helps to decouple the client and service implementations. For
REST APIs built on HTTP, the uniform interface includes using standard HTTP verbs to perform operations
on resources. The most common operations are GET, POST, PUT, PATCH, and DELETE.
REST APIs use a stateless request model. HTTP requests should be independent and may occur in any
order, so keeping transient state information between requests is not feasible. The only place where
information is stored is in the resources themselves, and each request should be an atomic operation.
This constraint enables web services to be highly scalable, because there is no need to retain any affinity

You might also like