Cloud Application Architecture Fundamentals 1651479395
Cloud Application Architecture Fundamentals 1651479395
This library of content presents a structured approach for designing applications on Azure that are scalable,
secure, resilient, and highly available. The guidance is based on proven practices that we have learned from
customer engagements.
Introduction
The cloud is changing how applications are designed and secured. Instead of monoliths, applications are
decomposed into smaller, decentralized services. These services communicate through APIs or by using
asynchronous messaging or eventing. Applications scale horizontally, adding new instances as demand requires.
These trends bring new challenges. Application states are distributed. Operations are done in parallel and
asynchronously. Applications must be resilient when failures occur. Malicious actors continuously target
applications. Deployments must be automated and predictable. Monitoring and telemetry are critical for gaining
insight into the system. This guide is designed to help you navigate these changes.
Monolithic Decomposed
Designed for predictable scalability Designed for elastic scale
Relational database Polyglot persistence (mix of storage technologies)
Synchronized processing Asynchronous processing
Design to avoid failures (MTBF) Design for failure (MTTR)
Occasional large updates Frequent small updates
Manual management Automated self-management
Snowflake servers Immutable infrastructure
Technology choices
Knowing the type of architecture you are building, now you can start to choose the main technology pieces for
the architecture. The following technology choices are critical:
Compute refers to the hosting model for the computing resources that your applications run on. For
more information, see Choose a compute service.
Data stores include databases but also storage for message queues, caches, logs, and anything else that
an application might persist to storage. For more information, see Choose a data store.
Messaging technologies enable asynchronous messages between components of the system. For more
information, see Choose a messaging service.
You will probably have to make additional technology choices along the way, but these three elements
(compute, data, and messaging) are central to most cloud applications and will determine many aspects of your
design.
Next steps
Architecture styles
Architecture styles
3/10/2022 • 5 minutes to read • Edit Online
An architecture style is a family of architectures that share certain characteristics. For example, N-tier is a
common architecture style. More recently, microservice architectures have started to gain favor. Architecture
styles don't require the use of particular technologies, but some technologies are well-suited for certain
architectures. For example, containers are a natural fit for microservices.
We have identified a set of architecture styles that are commonly found in cloud applications. The article for
each style includes:
A description and logical diagram of the style.
Recommendations for when to choose this style.
Benefits, challenges, and best practices.
A recommended deployment using relevant Azure services.
Web-Queue-Worker Front and backend jobs, decoupled by Relatively simple domain with some
async messaging. resource intensive tasks.
Big data Divide a huge dataset into small Batch and real-time data analysis.
chunks. Parallel processing on local Predictive analysis using ML.
datasets.
Big compute Data allocation to thousands of cores. Compute intensive domains such as
simulation.
The term big compute describes large-scale workloads that require a large number of cores, often numbering in
the hundreds or thousands. Scenarios include image rendering, fluid dynamics, financial risk modeling, oil
exploration, drug design, and engineering stress analysis, among others.
Benefits
High performance with "embarrassingly parallel" processing.
Can harness hundreds or thousands of computer cores to solve large problems faster.
Access to specialized high-performance hardware, with dedicated high-speed InfiniBand networks.
You can provision VMs as needed to do work, and then tear them down.
Challenges
Managing the VM infrastructure.
Managing the volume of number crunching
Provisioning thousands of cores in a timely manner.
For tightly coupled tasks, adding more cores can have diminishing returns. You may need to experiment to
find the optimum number of cores.
Next steps
Choose an Azure compute service for your application
High Performance Computing (HPC) on Azure
HPC cluster deployed in the cloud
Big data architecture style
3/10/2022 • 10 minutes to read • Edit Online
A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or
complex for traditional database systems.
Big data solutions typically involve one or more of the following types of workload:
Batch processing of big data sources at rest.
Real-time processing of big data in motion.
Interactive exploration of big data.
Predictive analytics and machine learning.
Most big data architectures include some or all of the following components:
Data sources : All big data solutions start with one or more data sources. Examples include:
Application data stores, such as relational databases.
Static files produced by applications, such as web server log files.
Real-time data sources, such as IoT devices.
Data storage : Data for batch processing operations is typically stored in a distributed file store that can
hold high volumes of large files in various formats. This kind of store is often called a data lake. Options
for implementing this storage include Azure Data Lake Store or blob containers in Azure Storage.
Batch processing : Because the data sets are so large, often a big data solution must process data files
using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually
these jobs involve reading source files, processing them, and writing the output to new files. Options
include running U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or custom Map/Reduce jobs in
an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in an HDInsight Spark cluster.
Real-time message ingestion : If the solution includes real-time sources, the architecture must include
a way to capture and store real-time messages for stream processing. This might be a simple data store,
where incoming messages are dropped into a folder for processing. However, many solutions need a
message ingestion store to act as a buffer for messages, and to support scale-out processing, reliable
delivery, and other message queuing semantics. Options include Azure Event Hubs, Azure IoT Hubs, and
Kafka.
Stream processing : After capturing real-time messages, the solution must process them by filtering,
aggregating, and otherwise preparing the data for analysis. The processed stream data is then written to
an output sink. Azure Stream Analytics provides a managed stream processing service based on
perpetually running SQL queries that operate on unbounded streams. You can also use open source
Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster.
Analytical data store : Many big data solutions prepare data for analysis and then serve the processed
data in a structured format that can be queried using analytical tools. The analytical data store used to
serve these queries can be a Kimball-style relational data warehouse, as seen in most traditional business
intelligence (BI) solutions. Alternatively, the data could be presented through a low-latency NoSQL
technology such as HBase, or an interactive Hive database that provides a metadata abstraction over data
files in the distributed data store. Azure Synapse Analytics provides a managed service for large-scale,
cloud-based data warehousing. HDInsight supports Interactive Hive, HBase, and Spark SQL, which can
also be used to serve data for analysis.
Analysis and repor ting : The goal of most big data solutions is to provide insights into the data through
analysis and reporting. To empower users to analyze the data, the architecture may include a data
modeling layer, such as a multidimensional OLAP cube or tabular data model in Azure Analysis Services.
It might also support self-service BI, using the modeling and visualization technologies in Microsoft
Power BI or Microsoft Excel. Analysis and reporting can also take the form of interactive data exploration
by data scientists or data analysts. For these scenarios, many Azure services support analytical
notebooks, such as Jupyter, enabling these users to leverage their existing skills with Python or R. For
large-scale data exploration, you can use Microsoft R Server, either standalone or with Spark.
Orchestration : Most big data solutions consist of repeated data processing operations, encapsulated in
workflows, that transform source data, move data between multiple sources and sinks, load the
processed data into an analytical data store, or push the results straight to a report or dashboard. To
automate these workflows, you can use an orchestration technology such Azure Data Factory or Apache
Oozie and Sqoop.
Azure includes many services that can be used in a big data architecture. They fall roughly into two categories:
Managed services, including Azure Data Lake Store, Azure Data Lake Analytics, Azure Synapse Analytics,
Azure Stream Analytics, Azure Event Hub, Azure IoT Hub, and Azure Data Factory.
Open source technologies based on the Apache Hadoop platform, including HDFS, HBase, Hive, Pig, Spark,
Storm, Oozie, Sqoop, and Kafka. These technologies are available on Azure in the Azure HDInsight service.
These options are not mutually exclusive, and many solutions combine open source technologies with Azure
services.
Benefits
Technology choices . You can mix and match Azure managed services and Apache technologies in
HDInsight clusters, to capitalize on existing skills or technology investments.
Performance through parallelism . Big data solutions take advantage of parallelism, enabling high-
performance solutions that scale to large volumes of data.
Elastic scale . All of the components in the big data architecture support scale-out provisioning, so that you
can adjust your solution to small or large workloads, and pay only for the resources that you use.
Interoperability with existing solutions . The components of the big data architecture are also used for
IoT processing and enterprise BI solutions, enabling you to create an integrated solution across data
workloads.
Challenges
Complexity . Big data solutions can be extremely complex, with numerous components to handle data
ingestion from multiple data sources. It can be challenging to build, test, and troubleshoot big data processes.
Moreover, there may be a large number of configuration settings across multiple systems that must be used
in order to optimize performance.
Skillset . Many big data technologies are highly specialized, and use frameworks and languages that are not
typical of more general application architectures. On the other hand, big data technologies are evolving new
APIs that build on more established languages. For example, the U-SQL language in Azure Data Lake
Analytics is based on a combination of Transact-SQL and C#. Similarly, SQL-based APIs are available for Hive,
HBase, and Spark.
Technology maturity . Many of the technologies used in big data are evolving. While core Hadoop
technologies such as Hive and Pig have stabilized, emerging technologies such as Spark introduce extensive
changes and enhancements with each new release. Managed services such as Azure Data Lake Analytics and
Azure Data Factory are relatively young, compared with other Azure services, and will likely evolve over time.
Security . Big data solutions usually rely on storing all static data in a centralized data lake. Securing access
to this data can be challenging, especially when the data must be ingested and consumed by multiple
applications and platforms.
Best practices
Leverage parallelism . Most big data processing technologies distribute the workload across multiple
processing units. This requires that static data files are created and stored in a splittable format.
Distributed file systems such as HDFS can optimize read and write performance, and the actual
processing is performed by multiple cluster nodes in parallel, which reduces overall job times.
Par tition data . Batch processing usually happens on a recurring schedule — for example, weekly or
monthly. Partition data files, and data structures such as tables, based on temporal periods that match the
processing schedule. That simplifies data ingestion and job scheduling, and makes it easier to
troubleshoot failures. Also, partitioning tables that are used in Hive, U-SQL, or SQL queries can
significantly improve query performance.
Apply schema-on-read semantics . Using a data lake lets you to combine storage for files in multiple
formats, whether structured, semi-structured, or unstructured. Use schema-on-read semantics, which
project a schema onto the data when the data is processing, not when the data is stored. This builds
flexibility into the solution, and prevents bottlenecks during data ingestion caused by data validation and
type checking.
Process data in-place . Traditional BI solutions often use an extract, transform, and load (ETL) process to
move data into a data warehouse. With larger volumes data, and a greater variety of formats, big data
solutions generally use variations of ETL, such as transform, extract, and load (TEL). With this approach,
the data is processed within the distributed data store, transforming it to the required structure, before
moving the transformed data into an analytical data store.
Balance utilization and time costs . For batch processing jobs, it's important to consider two factors:
The per-unit cost of the compute nodes, and the per-minute cost of using those nodes to complete the
job. For example, a batch job may take eight hours with four cluster nodes. However, it might turn out
that the job uses all four nodes only during the first two hours, and after that, only two nodes are
required. In that case, running the entire job on two nodes would increase the total job time, but would
not double it, so the total cost would be less. In some business scenarios, a longer processing time may
be preferable to the higher cost of using underutilized cluster resources.
Separate cluster resources . When deploying HDInsight clusters, you will normally achieve better
performance by provisioning separate cluster resources for each type of workload. For example, although
Spark clusters include Hive, if you need to perform extensive processing with both Hive and Spark, you
should consider deploying separate dedicated Spark and Hadoop clusters. Similarly, if you are using
HBase and Storm for low latency stream processing and Hive for batch processing, consider separate
clusters for Storm, HBase, and Hadoop.
Orchestrate data ingestion . In some cases, existing business applications may write data files for batch
processing directly into Azure storage blob containers, where they can be consumed by HDInsight or
Azure Data Lake Analytics. However, you will often need to orchestrate the ingestion of data from on-
premises or external data sources into the data lake. Use an orchestration workflow or pipeline, such as
those supported by Azure Data Factory or Oozie, to achieve this in a predictable and centrally
manageable fashion.
Scrub sensitive data early . The data ingestion workflow should scrub sensitive data early in the
process, to avoid storing it in the data lake.
IoT architecture
Internet of Things (IoT) is a specialized subset of big data solutions. The following diagram shows a possible
logical architecture for IoT. The diagram emphasizes the event-streaming components of the architecture.
The cloud gateway ingests device events at the cloud boundary, using a reliable, low latency messaging
system.
Devices might send events directly to the cloud gateway, or through a field gateway . A field gateway is a
specialized device or software, usually colocated with the devices, that receives events and forwards them to the
cloud gateway. The field gateway might also preprocess the raw device events, performing functions such as
filtering, aggregation, or protocol transformation.
After ingestion, events go through one or more stream processors that can route the data (for example, to
storage) or perform analytics and other processing.
The following are some common types of processing. (This list is certainly not exhaustive.)
Writing event data to cold storage, for archiving or batch analytics.
Hot path analytics, analyzing the event stream in (near) real time, to detect anomalies, recognize patterns
over rolling time windows, or trigger alerts when a specific condition occurs in the stream.
Handling special types of non-telemetry messages from devices, such as notifications and alarms.
Machine learning.
The boxes that are shaded gray show components of an IoT system that are not directly related to event
streaming, but are included here for completeness.
The device registr y is a database of the provisioned devices, including the device IDs and usually device
metadata, such as location.
The provisioning API is a common external interface for provisioning and registering new devices.
Some IoT solutions allow command and control messages to be sent to devices.
This section has presented a very high-level view of IoT, and there are many subtleties and challenges to
consider. For a more detailed reference architecture and discussion, see the Microsoft Azure IoT Reference
Architecture (PDF download).
Next steps
Learn more about big data architectures.
Learn more about IoT solutions.
Event-driven architecture style
3/10/2022 • 3 minutes to read • Edit Online
An event-driven architecture consists of event producers that generate a stream of events, and event
consumers that listen for the events.
Events are delivered in near real time, so consumers can respond immediately to events as they occur. Producers
are decoupled from consumers — a producer doesn't know which consumers are listening. Consumers are also
decoupled from each other, and every consumer sees all of the events. This differs from a Competing
Consumers pattern, where consumers pull messages from a queue and a message is processed just once
(assuming no errors). In some systems, such as IoT, events must be ingested at very high volumes.
An event driven architecture can use a pub/sub model or an event stream model.
Pub/sub : The messaging infrastructure keeps track of subscriptions. When an event is published, it sends
the event to each subscriber. After an event is received, it cannot be replayed, and new subscribers do not
see the event.
Event streaming : Events are written to a log. Events are strictly ordered (within a partition) and durable.
Clients don't subscribe to the stream, instead a client can read from any part of the stream. The client is
responsible for advancing its position in the stream. That means a client can join at any time, and can
replay events.
On the consumer side, there are some common variations:
Simple event processing . An event immediately triggers an action in the consumer. For example, you
could use Azure Functions with a Service Bus trigger, so that a function executes whenever a message is
published to a Service Bus topic.
Complex event processing . A consumer processes a series of events, looking for patterns in the event
data, using a technology such as Azure Stream Analytics or Apache Storm. For example, you could
aggregate readings from an embedded device over a time window, and generate a notification if the
moving average crosses a certain threshold.
Event stream processing . Use a data streaming platform, such as Azure IoT Hub or Apache Kafka, as a
pipeline to ingest events and feed them to stream processors. The stream processors act to process or
transform the stream. There may be multiple stream processors for different subsystems of the
application. This approach is a good fit for IoT workloads.
The source of the events may be external to the system, such as physical devices in an IoT solution. In that case,
the system must be able to ingest the data at the volume and throughput that is required by the data source.
In the logical diagram above, each type of consumer is shown as a single box. In practice, it's common to have
multiple instances of a consumer, to avoid having the consumer become a single point of failure in system.
Multiple instances might also be necessary to handle the volume and frequency of events. Also, a single
consumer might process events on multiple threads. This can create challenges if events must be processed in
order or require exactly-once semantics. See Minimize Coordination.
Benefits
Producers and consumers are decoupled.
No point-to-point integrations. It's easy to add new consumers to the system.
Consumers can respond to events immediately as they arrive.
Highly scalable and distributed.
Subsystems have independent views of the event stream.
Challenges
Guaranteed delivery. In some systems, especially in IoT scenarios, it's crucial to guarantee that events are
delivered.
Processing events in order or exactly once. Each consumer type typically runs in multiple instances, for
resiliency and scalability. This can create a challenge if the events must be processed in order (within a
consumer type), or if the processing logic is not idempotent.
Additional considerations
The amount of data to include in an event can be a significant consideration that affects both performance
and cost. Putting all the relevant information needed for processing in the event itself can simplify the
processing code and save additional lookups. Putting the minimal amount of information in an event, like
just a couple of identifiers, will reduce transport time and cost, but requires the processing code to look up
any additional information it needs. For more information on this, take a look at this blog post.
Microservices architecture style
3/10/2022 • 6 minutes to read • Edit Online
A microservices architecture consists of a collection of small, autonomous services. Each service is self-
contained and should implement a single business capability within a bounded context. A bounded context is a
natural division within a business and provides an explicit boundary within which a domain model exists.
Benefits
Agility. Because microservices are deployed independently, it's easier to manage bug fixes and feature
releases. You can update a service without redeploying the entire application, and roll back an update if
something goes wrong. In many traditional applications, if a bug is found in one part of the application, it
can block the entire release process. New features may be held up waiting for a bug fix to be integrated,
tested, and published.
Small, focused teams . A microservice should be small enough that a single feature team can build, test,
and deploy it. Small team sizes promote greater agility. Large teams tend be less productive, because
communication is slower, management overhead goes up, and agility diminishes.
Small code base . In a monolithic application, there is a tendency over time for code dependencies to
become tangled. Adding a new feature requires touching code in a lot of places. By not sharing code or
data stores, a microservices architecture minimizes dependencies, and that makes it easier to add new
features.
Mix of technologies . Teams can pick the technology that best fits their service, using a mix of
technology stacks as appropriate.
Fault isolation . If an individual microservice becomes unavailable, it won't disrupt the entire application,
as long as any upstream microservices are designed to handle faults correctly (for example, by
implementing circuit breaking).
Scalability . Services can be scaled independently, letting you scale out subsystems that require more
resources, without scaling out the entire application. Using an orchestrator such as Kubernetes or Service
Fabric, you can pack a higher density of services onto a single host, which allows for more efficient
utilization of resources.
Data isolation . It is much easier to perform schema updates, because only a single microservice is
affected. In a monolithic application, schema updates can become very challenging, because different
parts of the application may all touch the same data, making any alterations to the schema risky.
Challenges
The benefits of microservices don't come for free. Here are some of the challenges to consider before
embarking on a microservices architecture.
Complexity . A microservices application has more moving parts than the equivalent monolithic
application. Each service is simpler, but the entire system as a whole is more complex.
Development and testing . Writing a small service that relies on other dependent services requires a
different approach than a writing a traditional monolithic or layered application. Existing tools are not
always designed to work with service dependencies. Refactoring across service boundaries can be
difficult. It is also challenging to test service dependencies, especially when the application is evolving
quickly.
Lack of governance . The decentralized approach to building microservices has advantages, but it can
also lead to problems. You may end up with so many different languages and frameworks that the
application becomes hard to maintain. It may be useful to put some project-wide standards in place,
without overly restricting teams' flexibility. This especially applies to cross-cutting functionality such as
logging.
Network congestion and latency . The use of many small, granular services can result in more
interservice communication. Also, if the chain of service dependencies gets too long (service A calls B,
which calls C...), the additional latency can become a problem. You will need to design APIs carefully.
Avoid overly chatty APIs, think about serialization formats, and look for places to use asynchronous
communication patterns like queue-based load leveling.
Data integrity . With each microservice responsible for its own data persistence. As a result, data
consistency can be a challenge. Embrace eventual consistency where possible.
Management . To be successful with microservices requires a mature DevOps culture. Correlated logging
across services can be challenging. Typically, logging must correlate multiple service calls for a single
user operation.
Versioning . Updates to a service must not break services that depend on it. Multiple services could be
updated at any given time, so without careful design, you might have problems with backward or
forward compatibility.
Skill set . Microservices are highly distributed systems. Carefully evaluate whether the team has the skills
and experience to be successful.
Best practices
Model services around the business domain.
Decentralize everything. Individual teams are responsible for designing and building services. Avoid
sharing code or data schemas.
Data storage should be private to the service that owns the data. Use the best storage for each service
and data type.
Services communicate through well-designed APIs. Avoid leaking implementation details. APIs should
model the domain, not the internal implementation of the service.
Avoid coupling between services. Causes of coupling include shared database schemas and rigid
communication protocols.
Offload cross-cutting concerns, such as authentication and SSL termination, to the gateway.
Keep domain knowledge out of the gateway. The gateway should handle and route client requests
without any knowledge of the business rules or domain logic. Otherwise, the gateway becomes a
dependency and can cause coupling between services.
Services should have loose coupling and high functional cohesion. Functions that are likely to change
together should be packaged and deployed together. If they reside in separate services, those services
end up being tightly coupled, because a change in one service will require updating the other service.
Overly chatty communication between two services may be a symptom of tight coupling and low
cohesion.
Isolate failures. Use resiliency strategies to prevent failures within a service from cascading. See
Resiliency patterns and Designing reliable applications.
Next steps
For detailed guidance about building a microservices architecture on Azure, see Designing, building, and
operating microservices on Azure.
N-tier architecture style
3/10/2022 • 5 minutes to read • Edit Online
An N-tier architecture divides an application into logical layers and physical tiers .
Layers are a way to separate responsibilities and manage dependencies. Each layer has a specific responsibility.
A higher layer can use services in a lower layer, but not the other way around.
Tiers are physically separated, running on separate machines. A tier can call to another tier directly, or use
asynchronous messaging (message queue). Although each layer might be hosted in its own tier, that's not
required. Several layers might be hosted on the same tier. Physically separating the tiers improves scalability
and resiliency, but also adds latency from the additional network communication.
A traditional three-tier application has a presentation tier, a middle tier, and a database tier. The middle tier is
optional. More complex applications can have more than three tiers. The diagram above shows an application
with two middle tiers, encapsulating different areas of functionality.
An N-tier application can have a closed layer architecture or an open layer architecture :
In a closed layer architecture, a layer can only call the next layer immediately down.
In an open layer architecture, a layer can call any of the layers below it.
A closed layer architecture limits the dependencies between layers. However, it might create unnecessary
network traffic, if one layer simply passes requests along to the next layer.
Benefits
Portability between cloud and on-premises, and between cloud platforms.
Less learning curve for most developers.
Natural evolution from the traditional application model.
Open to heterogeneous environment (Windows/Linux)
Challenges
It's easy to end up with a middle tier that just does CRUD operations on the database, adding extra latency
without doing any useful work.
Monolithic design prevents independent deployment of features.
Managing an IaaS application is more work than an application that uses only managed services.
It can be difficult to manage network security in a large system.
Best practices
Use autoscaling to handle changes in load. See Autoscaling best practices.
Use asynchronous messaging to decouple tiers.
Cache semistatic data. See Caching best practices.
Configure the database tier for high availability, using a solution such as SQL Server Always On availability
groups.
Place a web application firewall (WAF) between the front end and the Internet.
Place each tier in its own subnet, and use subnets as a security boundary.
Restrict access to the data tier, by allowing requests only from the middle tier(s).
Each tier consists of two or more VMs, placed in an availability set or virtual machine scale set. Multiple VMs
provide resiliency in case one VM fails. Load balancers are used to distribute requests across the VMs in a tier. A
tier can be scaled horizontally by adding more VMs to the pool.
Each tier is also placed inside its own subnet, meaning their internal IP addresses fall within the same address
range. That makes it easy to apply network security group rules and route tables to individual tiers.
The web and business tiers are stateless. Any VM can handle any request for that tier. The data tier should
consist of a replicated database. For Windows, we recommend SQL Server, using Always On availability groups
for high availability. For Linux, choose a database that supports replication, such as Apache Cassandra.
Network security groups restrict access to each tier. For example, the database tier only allows access from the
business tier.
NOTE
The layer labeled "Business Tier" in our reference diagram is a moniker to the business logic tier. Likewise, we also call the
presentation tier the "Web Tier." In our example, this is a web application, though multi-tier architectures can be used for
other topologies as well (like desktop apps). Name your tiers what works best for your team to communicate the intent of
that logical and/or physical tier in your application - you could even express that naming in resources you choose to
represent that tier (e.g. vmss-appName-business-layer).
The core components of this architecture are a web front end that serves client requests, and a worker that
performs resource-intensive tasks, long-running workflows, or batch jobs. The web front end communicates
with the worker through a message queue .
Other components that are commonly incorporated into this architecture include:
One or more databases.
A cache to store values from the database for quick reads.
CDN to serve static content
Remote services, such as email or SMS service. Often these are provided by third parties.
Identity provider for authentication.
The web and worker are both stateless. Session state can be stored in a distributed cache. Any long-running
work is done asynchronously by the worker. The worker can be triggered by messages on the queue, or run on a
schedule for batch processing. The worker is an optional component. If there are no long-running operations,
the worker can be omitted.
The front end might consist of a web API. On the client side, the web API can be consumed by a single-page
application that makes AJAX calls, or by a native client application.
Benefits
Relatively simple architecture that is easy to understand.
Easy to deploy and manage.
Clear separation of concerns.
The front end is decoupled from the worker using asynchronous messaging.
The front end and the worker can be scaled independently.
Challenges
Without careful design, the front end and the worker can become large, monolithic components that are
difficult to maintain and update.
There may be hidden dependencies, if the front end and worker share data schemas or code modules.
Best practices
Expose a well-designed API to the client. See API design best practices.
Autoscale to handle changes in load. See Autoscaling best practices.
Cache semi-static data. See Caching best practices.
Use a CDN to host static content. See CDN best practices.
Use polyglot persistence when appropriate. See Use the best data store for the job.
Partition data to improve scalability, reduce contention, and optimize performance. See Data partitioning best
practices.
The front end is implemented as an Azure App Service web app, and the worker is implemented as an
Azure Functions app. The web app and the function app are both associated with an App Service plan that
provides the VM instances.
You can use either Azure Service Bus or Azure Storage queues for the message queue. (The diagram
shows an Azure Storage queue.)
Azure Cache for Redis stores session state and other data that needs low latency access.
Azure CDN is used to cache static content such as images, CSS, or HTML.
For storage, choose the storage technologies that best fit the needs of the application. You might use
multiple storage technologies (polyglot persistence). To illustrate this idea, the diagram shows Azure SQL
Database and Azure Cosmos DB.
For more details, see App Service web application reference architecture.
Additional considerations
Not every transaction has to go through the queue and worker to storage. The web front end can
perform simple read/write operations directly. Workers are designed for resource-intensive tasks or
long-running workflows. In some cases, you might not need a worker at all.
Use the built-in autoscale feature of App Service to scale out the number of VM instances. If the load on
the application follows predictable patterns, use schedule-based autoscale. If the load is unpredictable,
use metrics-based autoscaling rules.
Consider putting the web app and the function app into separate App Service plans. That way, they can be
scaled independently.
Use separate App Service plans for production and testing. Otherwise, if you use the same plan for
production and testing, it means your tests are running on your production VMs.
Use deployment slots to manage deployments. This lets you to deploy an updated version to a staging
slot, then swap over to the new version. It also lets you swap back to the previous version, if there was a
problem with the update.
Ten design principles for Azure applications
3/10/2022 • 2 minutes to read • Edit Online
Follow these design principles to make your application more scalable, resilient, and manageable.
Design for self healing . In a distributed system, failures happen. Design your application to be self healing
when failures occur.
Make all things redundant . Build redundancy into your application, to avoid having single points of failure.
Minimize coordination . Minimize coordination between application services to achieve scalability.
Design to scale out . Design your application so that it can scale horizontally, adding or removing new
instances as demand requires.
Par tition around limits . Use partitioning to work around database, network, and compute limits.
Design for operations . Design your application so that the operations team has the tools they need.
Use managed ser vices . When possible, use platform as a service (PaaS) rather than infrastructure as a service
(IaaS).
Use the best data store for the job . Pick the storage technology that is the best fit for your data and how it
will be used.
Design for evolution . All successful applications change over time. An evolutionary design is key for
continuous innovation.
Build for the needs of business . Every design decision must be justified by a business requirement.
Design for self healing
3/10/2022 • 4 minutes to read • Edit Online
Recommendations
Retr y failed operations . Transient failures may occur due to momentary loss of network connectivity, a
dropped database connection, or a timeout when a service is busy. Build retry logic into your application to
handle transient failures. For many Azure services, the client SDK implements automatic retries. For more
information, see Transient fault handling and the Retry pattern.
Protect failing remote ser vices (Circuit Breaker) . It's good to retry after a transient failure, but if the failure
persists, you can end up with too many callers hammering a failing service. This can lead to cascading failures,
as requests back up. Use the Circuit Breaker pattern to fail fast (without making the remote call) when an
operation is likely to fail.
Isolate critical resources (Bulkhead) . Failures in one subsystem can sometimes cascade. This can happen if a
failure causes some resources, such as threads or sockets, not to get freed in a timely manner, leading to
resource exhaustion. To avoid this, partition a system into isolated groups, so that a failure in one partition does
not bring down the entire system.
Perform load leveling . Applications may experience sudden spikes in traffic that can overwhelm services on
the backend. To avoid this, use the Queue-Based Load Leveling pattern to queue work items to run
asynchronously. The queue acts as a buffer that smooths out peaks in the load.
Fail over . If an instance can't be reached, fail over to another instance. For things that are stateless, like a web
server, put several instances behind a load balancer or traffic manager. For things that store state, like a
database, use replicas and fail over. Depending on the data store and how it replicates, this may require the
application to deal with eventual consistency.
Compensate failed transactions . In general, avoid distributed transactions, as they require coordination
across services and resources. Instead, compose an operation from smaller individual transactions. If the
operation fails midway through, use Compensating Transactions to undo any step that already completed.
Checkpoint long-running transactions . Checkpoints can provide resiliency if a long-running operation fails.
When the operation restarts (for example, it is picked up by another VM), it can be resumed from the last
checkpoint.
Degrade gracefully . Sometimes you can't work around a problem, but you can provide reduced functionality
that is still useful. Consider an application that shows a catalog of books. If the application can't retrieve the
thumbnail image for the cover, it might show a placeholder image. Entire subsystems might be noncritical for
the application. For example, in an e-commerce site, showing product recommendations is probably less critical
than processing orders.
Throttle clients . Sometimes a small number of users create excessive load, which can reduce your application's
availability for other users. In this situation, throttle the client for a certain period of time. See the Throttling
pattern.
Block bad actors . Just because you throttle a client, it doesn't mean client was acting maliciously. It just means
the client exceeded their service quota. But if a client consistently exceeds their quota or otherwise behaves
badly, you might block them. Define an out-of-band process for user to request getting unblocked.
Use leader election . When you need to coordinate a task, use Leader Election to select a coordinator. That way,
the coordinator is not a single point of failure. If the coordinator fails, a new one is selected. Rather than
implement a leader election algorithm from scratch, consider an off-the-shelf solution such as Zookeeper.
Test with fault injection . All too often, the success path is well tested but not the failure path. A system could
run in production for a long time before a failure path is exercised. Use fault injection to test the resiliency of the
system to failures, either by triggering actual failures or by simulating them.
Embrace chaos engineering . Chaos engineering extends the notion of fault injection, by randomly injecting
failures or abnormal conditions into production instances.
For a structured approach to making your applications self healing, see Design reliable applications for Azure.
Make all things redundant
3/10/2022 • 2 minutes to read • Edit Online
Recommendations
Consider business requirements . The amount of redundancy built into a system can affect both cost and
complexity. Your architecture should be informed by your business requirements, such as recovery time
objective (RTO). For example, a multi-region deployment is more expensive than a single-region deployment,
and is more complicated to manage. You will need operational procedures to handle failover and failback. The
additional cost and complexity might be justified for some business scenarios and not others.
Place VMs behind a load balancer . Don't use a single VM for mission-critical workloads. Instead, place
multiple VMs behind a load balancer. If any VM becomes unavailable, the load balancer distributes traffic to the
remaining healthy VMs. To learn how to deploy this configuration, see Multiple VMs for scalability and
availability.
Replicate databases . Azure SQL Database and Cosmos DB automatically replicate the data within a region,
and you can enable geo-replication across regions. If you are using an IaaS database solution, choose one that
supports replication and failover, such as SQL Server Always On availability groups.
Enable geo-replication . Geo-replication for Azure SQL Database and Cosmos DB creates secondary readable
replicas of your data in one or more secondary regions. In the event of an outage, the database can fail over to
the secondary region for writes.
Par tition for availability . Database partitioning is often used to improve scalability, but it can also improve
availability. If one shard goes down, the other shards can still be reached. A failure in one shard will only disrupt
a subset of the total transactions.
Deploy to more than one region . For the highest availability, deploy the application to more than one
region. That way, in the rare case when a problem affects an entire region, the application can fail over to
another region. The following diagram shows a multi-region application that uses Azure Traffic Manager to
handle failover.
Synchronize front and backend failover . Use Azure Traffic Manager to fail over the front end. If the front
end becomes unreachable in one region, Traffic Manager will route new requests to the secondary region.
Depending on your database solution, you may need to coordinate failing over the database.
Use automatic failover but manual failback . Use Traffic Manager for automatic failover, but not for
automatic failback. Automatic failback carries a risk that you might switch to the primary region before the
region is completely healthy. Instead, verify that all application subsystems are healthy before manually failing
back. Also, depending on the database, you might need to check data consistency before failing back.
Include redundancy for Traffic Manager . Traffic Manager is a possible failure point. Review the Traffic
Manager SLA, and determine whether using Traffic Manager alone meets your business requirements for high
availability. If not, consider adding another traffic management solution as a failback. If the Azure Traffic
Manager service fails, change your CNAME records in DNS to point to the other traffic management service.
Design to scale out
3/10/2022 • 2 minutes to read • Edit Online
Recommendations
Avoid instance stickiness . Stickiness, or session affinity, is when requests from the same client are always
routed to the same server. Stickiness limits the application's ability to scale out. For example, traffic from a high-
volume user will not be distributed across instances. Causes of stickiness include storing session state in
memory, and using machine-specific keys for encryption. Make sure that any instance can handle any request.
Identify bottlenecks . Scaling out isn't a magic fix for every performance issue. For example, if your backend
database is the bottleneck, it won't help to add more web servers. Identify and resolve the bottlenecks in the
system first, before throwing more instances at the problem. Stateful parts of the system are the most likely
cause of bottlenecks.
Decompose workloads by scalability requirements. Applications often consist of multiple workloads, with
different requirements for scaling. For example, an application might have a public-facing site and a separate
administration site. The public site may experience sudden surges in traffic, while the administration site has a
smaller, more predictable load.
Offload resource-intensive tasks. Tasks that require a lot of CPU or I/O resources should be moved to
background jobs when possible, to minimize the load on the front end that is handling user requests.
Use built-in autoscaling features . Many Azure compute services have built-in support for autoscaling. If the
application has a predictable, regular workload, scale out on a schedule. For example, scale out during business
hours. Otherwise, if the workload is not predictable, use performance metrics such as CPU or request queue
length to trigger autoscaling. For autoscaling best practices, see Autoscaling.
Consider aggressive autoscaling for critical workloads . For critical workloads, you want to keep ahead of
demand. It's better to add new instances quickly under heavy load to handle the additional traffic, and then
gradually scale back.
Design for scale in . Remember that with elastic scale, the application will have periods of scale in, when
instances get removed. The application must gracefully handle instances being removed. Here are some ways to
handle scalein:
Listen for shutdown events (when available) and shut down cleanly.
Clients/consumers of a service should support transient fault handling and retry.
For long-running tasks, consider breaking up the work, using checkpoints or the Pipes and Filters pattern.
Put work items on a queue so that another instance can pick up the work, if an instance is removed in the
middle of processing.
Partition around limits
3/10/2022 • 2 minutes to read • Edit Online
Recommendations
Par tition different par ts of the application . Databases are one obvious candidate for partitioning, but also
consider storage, cache, queues, and compute instances.
Design the par tition key to avoid hotspots . If you partition a database, but one shard still gets the majority
of the requests, then you haven't solved your problem. Ideally, load gets distributed evenly across all the
partitions. For example, hash by customer ID and not the first letter of the customer name, because some letters
are more frequent. The same principle applies when partitioning a message queue. Pick a partition key that
leads to an even distribution of messages across the set of queues. For more information, see Sharding.
Par tition around Azure subscription and ser vice limits . Individual components and services have limits,
but there are also limits for subscriptions and resource groups. For very large applications, you might need to
partition around those limits.
Par tition at different levels . Consider a database server deployed on a VM. The VM has a VHD that is backed
by Azure Storage. The storage account belongs to an Azure subscription. Notice that each step in the hierarchy
has limits. The database server may have a connection pool limit. VMs have CPU and network limits. Storage has
IOPS limits. The subscription has limits on the number of VM cores. Generally, it's easier to partition lower in the
hierarchy. Only large applications should need to partition at the subscription level.
Design for operations
3/10/2022 • 2 minutes to read • Edit Online
Design an application so that the operations team has the tools they
need
The cloud has dramatically changed the role of the operations team. They are no longer responsible for
managing the hardware and infrastructure that hosts the application. That said, operations is still a critical part
of running a successful cloud application. Some of the important functions of the operations team include:
Deployment
Monitoring
Escalation
Incident response
Security auditing
Robust logging and tracing are particularly important in cloud applications. Involve the operations team in
design and planning, to ensure the application gives them the data and insight they need to be successful.
Recommendations
Make all things obser vable . Once a solution is deployed and running, logs and traces are your primary
insight into the system. Tracing records a path through the system, and is useful to pinpoint bottlenecks,
performance issues, and failure points. Logging captures individual events such as application state changes,
errors, and exceptions. Log in production, or else you lose insight at the very times when you need it the most.
Instrument for monitoring . Monitoring gives insight into how well (or poorly) an application is performing,
in terms of availability, performance, and system health. For example, monitoring tells you whether you are
meeting your SLA. Monitoring happens during the normal operation of the system. It should be as close to real-
time as possible, so that the operations staff can react to issues quickly. Ideally, monitoring can help avert
problems before they lead to a critical failure. For more information, see Monitoring and diagnostics.
Instrument for root cause analysis . Root cause analysis is the process of finding the underlying cause of
failures. It occurs after a failure has already happened.
Use distributed tracing . Use a distributed tracing system that is designed for concurrency, asynchrony, and
cloud scale. Traces should include a correlation ID that flows across service boundaries. A single operation may
involve calls to multiple application services. If an operation fails, the correlation ID helps to pinpoint the cause
of the failure.
Standardize logs and metrics . The operations team will need to aggregate logs from across the various
services in your solution. If every service uses its own logging format, it becomes difficult or impossible to get
useful information from them. Define a common schema that includes fields such as correlation ID, event name,
IP address of the sender, and so forth. Individual services can derive custom schemas that inherit the base
schema, and contain additional fields.
Automate management tasks , including provisioning, deployment, and monitoring. Automating a task
makes it repeatable and less prone to human errors.
Treat configuration as code . Check configuration files into a version control system, so that you can track and
version your changes, and roll back if needed.
Use platform as a service (PaaS) options
3/10/2022 • 2 minutes to read • Edit Online
Hadoop HDInsight
MongoDB Cosmos DB
Please note that this is not meant to be an exhaustive list, but a subset of equivalent options.
Use the best data store for the job
3/10/2022 • 2 minutes to read • Edit Online
Pick the storage technology that is the best fit for your data and how
it will be used
Gone are the days when you would just stick all of your data into a big relational SQL database. Relational
databases are very good at what they do — providing ACID guarantees for transactions over relational data. But
they come with some costs:
Queries may require expensive joins.
Data must be normalized and conform to a predefined schema (schema on write).
Lock contention may impact performance.
In any large solution, it's likely that a single data store technology won't fill all your needs. Alternatives to
relational databases include key/value stores, document databases, search engine databases, time series
databases, column family databases, and graph databases. Each has pros and cons, and different types of data fit
more naturally into one or another.
For example, you might store a product catalog in a document database, such as Cosmos DB, which allows for a
flexible schema. In that case, each product description is a self-contained document. For queries over the entire
catalog, you might index the catalog and store the index in Azure Search. Product inventory might go into a SQL
database, because that data requires ACID guarantees.
Remember that data includes more than just the persisted application data. It also includes application logs,
events, messages, and caches.
Recommendations
Don't use a relational database for ever ything . Consider other data stores when appropriate. See Choose
the right data store.
Embrace polyglot persistence . In any large solution, it's likely that a single data store technology won't fill all
your needs.
Consider the type of data . For example, put transactional data into SQL, put JSON documents into a
document database, put telemetry data into a time series data base, put application logs in Elasticsearch, and put
blobs in Azure Blob Storage.
Prefer availability over (strong) consistency . The CAP theorem implies that a distributed system must
make trade-offs between availability and consistency. (Network partitions, the other leg of the CAP theorem, can
never be completely avoided.) Often, you can achieve higher availability by adopting an eventual consistency
model.
Consider the skillset of the development team . There are advantages to using polyglot persistence, but it's
possible to go overboard. Adopting a new data storage technology requires a new set of skills. The development
team must understand how to get the most out of the technology. They must understand appropriate usage
patterns, how to optimize queries, tune for performance, and so on. Factor this in when considering storage
technologies.
Use compensating transactions . A side effect of polyglot persistence is that single transaction might write
data to multiple stores. If something fails, use compensating transactions to undo any steps that already
completed.
Look at bounded contexts . Bounded context is a term from domain driven design. A bounded context is an
explicit boundary around a domain model, and defines which parts of the domain the model applies to. Ideally, a
bounded context maps to a subdomain of the business domain. The bounded contexts in your system are a
natural place to consider polyglot persistence. For example, "products" may appear in both the Product Catalog
subdomain and the Product Inventory subdomain, but it's very likely that these two subdomains have different
requirements for storing, updating, and querying products.
Design for evolution
3/10/2022 • 3 minutes to read • Edit Online
Recommendations
Enforce high cohesion and loose coupling . A service is cohesive if it provides functionality that logically
belongs together. Services are loosely coupled if you can change one service without changing the other. High
cohesion generally means that changes in one function will require changes in other related functions. If you
find that updating a service requires coordinated updates to other services, it may be a sign that your services
are not cohesive. One of the goals of domain-driven design (DDD) is to identify those boundaries.
Encapsulate domain knowledge . When a client consumes a service, the responsibility for enforcing the
business rules of the domain should not fall on the client. Instead, the service should encapsulate all of the
domain knowledge that falls under its responsibility. Otherwise, every client has to enforce the business rules,
and you end up with domain knowledge spread across different parts of the application.
Use asynchronous messaging . Asynchronous messaging is a way to decouple the message producer from
the consumer. The producer does not depend on the consumer responding to the message or taking any
particular action. With a pub/sub architecture, the producer may not even know who is consuming the message.
New services can easily consume the messages without any modifications to the producer.
Don't build domain knowledge into a gateway . Gateways can be useful in a microservices architecture, for
things like request routing, protocol translation, load balancing, or authentication. However, the gateway should
be restricted to this sort of infrastructure functionality. It should not implement any domain knowledge, to avoid
becoming a heavy dependency.
Expose open interfaces . Avoid creating custom translation layers that sit between services. Instead, a service
should expose an API with a well-defined API contract. The API should be versioned, so that you can evolve the
API while maintaining backward compatibility. That way, you can update a service without coordinating updates
to all of the upstream services that depend on it. Public facing services should expose a RESTful API over HTTP.
Backend services might use an RPC-style messaging protocol for performance reasons.
Design and test against ser vice contracts . When services expose well-defined APIs, you can develop and
test against those APIs. That way, you can develop and test an individual service without spinning up all of its
dependent services. (Of course, you would still perform integration and load testing against the real services.)
Abstract infrastructure away from domain logic . Don't let domain logic get mixed up with infrastructure-
related functionality, such as messaging or persistence. Otherwise, changes in the domain logic will require
updates to the infrastructure layers and vice versa.
Offload cross-cutting concerns to a separate ser vice . For example, if several services need to
authenticate requests, you could move this functionality into its own service. Then you could evolve the
authentication service — for example, by adding a new authentication flow — without touching any of the
services that use it.
Deploy ser vices independently . When the DevOps team can deploy a single service independently of other
services in the application, updates can happen more quickly and safely. Bug fixes and new features can be
rolled out at a more regular cadence. Design both the application and the release process to support
independent updates.
Build for the needs of the business
3/10/2022 • 2 minutes to read • Edit Online
Recommendations
Define business objectives , including the recovery time objective (RTO), recovery point objective (RPO), and
maximum tolerable outage (MTO). These numbers should inform decisions about the architecture. For example,
to achieve a low RTO, you might implement automated failover to a secondary region. But if your solution can
tolerate a higher RTO, that degree of redundancy might be unnecessary.
Document ser vice level agreements (SL A) and ser vice level objectives (SLO) , including availability
and performance metrics. You might build a solution that delivers 99.95% availability. Is that enough? The
answer is a business decision.
Model the application around the business domain . Start by analyzing the business requirements. Use
these requirements to model the application. Consider using a domain-driven design (DDD) approach to create
domain models that reflect the business processes and use cases.
Capture both functional and nonfunctional requirements . Functional requirements let you judge
whether the application does the right thing. Nonfunctional requirements let you judge whether the application
does those things well. In particular, make sure that you understand your requirements for scalability,
availability, and latency. These requirements will influence design decisions and choice of technology.
Decompose by workload . The term "workload" in this context means a discrete capability or computing task,
which can be logically separated from other tasks. Different workloads may have different requirements for
availability, scalability, data consistency, and disaster recovery.
Plan for growth . A solution might meet your current needs, in terms of number of users, volume of
transactions, data storage, and so forth. However, a robust application can handle growth without major
architectural changes. See Design to scale out and Partition around limits. Also consider that your business
model and business requirements will likely change over time. If an application's service model and data models
are too rigid, it becomes hard to evolve the application for new use cases and scenarios. See Design for
evolution.
Manage costs . In a traditional on-premises application, you pay upfront for hardware as a capital expenditure.
In a cloud application, you pay for the resources that you consume. Make sure that you understand the pricing
model for the services that you consume. The total cost will include network bandwidth usage, storage, IP
addresses, service consumption, and other factors. For more information, see Azure pricing. Also consider your
operations costs. In the cloud, you don't have to manage the hardware or other infrastructure, but you still need
to manage your applications, including DevOps, incident response, disaster recovery, and so forth.
Choose a Kubernetes at the edge compute option
3/10/2022 • 6 minutes to read • Edit Online
This document discusses the trade-offs for various options available for extending compute on the edge. The
following considerations for each Kubernetes option are covered:
Operational cost. The expected labor required to maintain and operate the Kubernetes clusters.
Ease of configuration. The level of difficulty to configure and deploy a Kubernetes cluster.
Flexibility. A measure of how adaptable the Kubernetes option is to integrate a customized
configuration with existing infrastructure at the edge.
Mixed node. Ability to run a Kubernetes cluster with both Linux and Windows nodes.
Assumptions
You are a cluster operator looking to understand different options for running Kubernetes at the edge
and managing clusters in Azure.
You have a good understanding of existing infrastructure and any other infrastructure requirements,
including storage and networking requirements.
After reading this document, you'll be in a better position to identify which option best fits your scenario and the
environment required.
*Other managed edge platforms (OpenShift, Tanzu, and so on) aren't in scope for this document.
**These values are based on using kubeadm, for the sake of simplicity. Different options for running bare-metal
Kubernetes at the edge would alter the rating in these categories.
Bare-metal Kubernetes
Ground-up configuration of Kubernetes using tools like kubeadm on any underlying infrastructure.
The biggest constraints for bare-metal Kubernetes are around the specific needs and requirements of the
organization. The opportunity to use any distribution, networking interface, and plugin means higher complexity
and operational cost. But this offers the most flexible option for customizing your cluster.
Scenario
Often, edge locations have specific requirements for running Kubernetes clusters that aren't met with the other
Azure solutions described in this document. Meaning this option is typically best for those unable to use
managed services due to unsupported existing infrastructure, or those who seek to have maximum control of
their clusters.
This option can be especially difficult for those who are new to Kubernetes. This isn't uncommon for
organizations looking to run edge clusters. Options like MicroK8s or k3s aim to flatten that learning
curve.
It's important to understand any underlying infrastructure and any integration that is expected to take
place up front. This will help to narrow down viable options and to identify any gaps with the open-
source tooling and/or plugins.
Enabling clusters with Azure Arc presents a simple way to manage your cluster from Azure alongside
other resources. This also brings other Azure capabilities to your cluster, including Azure Policy, Azure
Monitor, Microsoft Defender for Cloud, and other services.
Because cluster configuration isn't trivial, it's especially important to be mindful of CI/CD. Tracking and
acting on upstream changes of various plugins, and making sure those changes don't affect the health of
your cluster, becomes a direct responsibility. It's important for you to have a strong CI/CD solution, strong
testing, and monitoring in place.
Tooling options
Cluster bootstrap:
kubeadm: Kubernetes tool for creating ground-up Kubernetes clusters. Good for standard compute
resources (Linux/Windows).
MicroK8s: Simplified administration and configuration ("LowOps"), conformant Kubernetes by Canonical.
k3s: Certified Kubernetes distribution built for Internet of Things (IoT) and edge computing.
Storage:
Explore available CSI drivers: Many options are available to fit your requirements from cloud to local file
shares.
Networking:
A full list of available add-ons can be found here: Networking add-ons. Some popular options include
Flannel, a simple overlay network, and Calico, which provides a full networking stack.
Considerations
Operational cost:
Without the support that comes with managed services, it's up to the organization to maintain and operate
the cluster as a whole (storage, networking, upgrades, observability, application management). The
operational cost is considered high.
Ease of configuration:
Evaluating the many open-source options at every stage of configuration whether its networking, storage, or
monitoring options is inevitable and can become complex. Requires more consideration for configuring a
CI/CD for cluster configuration. Because of these concerns, the ease of configuration is considered difficult.
Flexibility:
With the ability to use any open-source tool or plugin without any provider restrictions, bare-metal
Kubernetes is highly flexible.
AKS on HCI
Note: This option is currently in preview .
AKS-HCI is a set of predefined settings and configurations that is used to deploy one or more Kubernetes
clusters (with Windows Admin Center or PowerShell modules) on a multi-node cluster running either Windows
Server 2019 Datacenter or Azure Stack HCI 20H2.
Scenario
Ideal for those who want a simplified and streamlined way to get a Microsoft-supported cluster on compatible
devices (Azure Stack HCI or Windows Server 2019 Datacenter). Operations and configuration complexity are
reduced at the expense of the flexibility when compared to the bare-metal Kubernetes option.
Considerations
At the time of this writing, the preview comes with many limitations (permissions, networking limitations, large
compute requirements, and documentation gaps). Purposes other than evaluation and development are
discouraged that this time.
Operational cost:
Microsoft-supported cluster minimizes operational costs.
Ease of configuration:
Pre-configured and well-documented Kubernetes cluster deployment simplifies the configuration required
compared to bare-metal Kubernetes.
Flexibility:
Cluster configuration itself is set, but Admin permissions are granted. The underlying infrastructure must
either be Azure Stack HCI or Windows Server 2019. This option is more flexible than Kubernetes on Azure
Stack Edge and less flexible than bare-metal Kubernetes.
Next steps
For more information, see the following articles:
What is Azure IoT Edge
Kubernetes on your Azure Stack Edge Pro GPU device
Use IoT Edge module to run a Kubernetes stateless application on your Azure Stack Edge Pro GPU device
Deploy a Kubernetes stateless application via kubectl on your Azure Stack Edge Pro GPU device
AI at the edge with Azure Stack Hub
Building a CI/CD pipeline for microservices on Kubernetes
Use Kubernetes dashboard to monitor your Azure Stack Edge Pro GPU device
Understand data store models
3/10/2022 • 12 minutes to read • Edit Online
Modern business systems manage increasingly large volumes of heterogeneous data. This heterogeneity means
that a single data store is usually not the best approach. Instead, it's often better to store different types of data
in different data stores, each focused toward a specific workload or usage pattern. The term polyglot persistence
is used to describe solutions that use a mix of data store technologies. Therefore, it's important to understand
the main storage models and their tradeoffs.
Selecting the right data store for your requirements is a key design decision. There are literally hundreds of
implementations to choose from among SQL and NoSQL databases. Data stores are often categorized by how
they structure data and the types of operations they support. This article describes several of the most common
storage models. Note that a particular data store technology may support multiple storage models. For
example, a relational database management systems (RDBMS) may also support key/value or graph storage. In
fact, there is a general trend for so-called multi-model support, where a single database system supports
several models. But it's still useful to understand the different models at a high level.
Not all data stores in a given category provide the same feature-set. Most data stores provide server-side
functionality to query and process data. Sometimes this functionality is built into the data storage engine. In
other cases, the data storage and processing capabilities are separated, and there may be several options for
processing and analysis. Data stores also support different programmatic and management interfaces.
Generally, you should start by considering which storage model is best suited for your requirements. Then
consider a particular data store within that category, based on factors such as feature set, cost, and ease of
management.
NOTE
Learn more about identifying and reviewing your data service requirements for cloud adoption, in the Microsoft Cloud
Adoption Framework for Azure. Likewise, you can also learn about selecting storage tools and services.
Key/value stores
A key/value store associates each data value with a unique key. Most key/value stores only support simple
query, insert, and delete operations. To modify a value (either partially or completely), an application must
overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an
atomic operation.
An application can store arbitrary data as a set of values. Any schema information must be provided by the
application. The key/value store simply retrieves or stores the value by key.
Key/value stores are highly optimized for applications performing simple lookups, but are less suitable if you
need to query data across different key/value stores. Key/value stores are also not optimized for querying by
value.
A single key/value store can be extremely scalable, as the data store can easily distribute data across multiple
nodes on separate machines.
Azure services
Azure Cosmos DB Table API and SQL API | (Cosmos DB Security Baseline)
Azure Cache for Redis | (Security Baseline)
Azure Table Storage | (Security Baseline)
Workload
Data is accessed using a single key, like a dictionary.
No joins, lock, or unions are required.
No aggregation mechanisms are used.
Secondary indexes are generally not used.
Data type
Each key is associated with a single value.
There is no schema enforcement.
No relationships between entities.
Examples
Data caching
Session management
User preference and profile management
Product recommendation and ad serving
Document databases
A document database stores a collection of documents, where each document consists of named fields and data.
The data can be simple values or complex elements such as lists and child collections. Documents are retrieved
by unique keys.
Typically, a document contains the data for single entity, such as a customer or an order. A document may
contain information that would be spread across several relational tables in an RDBMS. Documents don't need
to have the same structure. Applications can store different data in documents as business requirements change.
Azure service
Azure Cosmos DB SQL API | (Cosmos DB Security Baseline)
Workload
Insert and update operations are common.
No object-relational impedance mismatch. Documents can better match the object structures used in
application code.
Individual documents are retrieved and written as a single block.
Data requires index on multiple fields.
Data type
Data can be managed in de-normalized way.
Size of individual document data is relatively small.
Each document type can use its own schema.
Documents can include optional fields.
Document data is semi-structured, meaning that data types of each field are not strictly defined.
Examples
Product catalog
Content management
Inventory management
Graph databases
A graph database stores two types of information, nodes and edges. Edges specify relationships between nodes.
Nodes and edges can have properties that provide information about that node or edge, similar to columns in a
table. Edges can also have a direction indicating the nature of the relationship.
Graph databases can efficiently perform queries across the network of nodes and edges and analyze the
relationships between entities. The following diagram shows an organization's personnel database structured as
a graph. The entities are employees and departments, and the edges indicate reporting relationships and the
departments in which employees work.
This structure makes it straightforward to perform queries such as "Find all employees who report directly or
indirectly to Sarah" or "Who works in the same department as John?" For large graphs with lots of entities and
relationships, you can perform very complex analyses very quickly. Many graph databases provide a query
language that you can use to traverse a network of relationships efficiently.
Azure services
Azure Cosmos DB Gremlin API | (Security Baseline)
SQL Server | (Security Baseline)
Workload
Complex relationships between data items involving many hops between related data items.
The relationship between data items are dynamic and change over time.
Relationships between objects are first-class citizens, without requiring foreign-keys and joins to traverse.
Data type
Nodes and relationships.
Nodes are similar to table rows or JSON documents.
Relationships are just as important as nodes, and are exposed directly in the query language.
Composite objects, such as a person with multiple phone numbers, tend to be broken into separate, smaller
nodes, combined with traversable relationships
Examples
Organization charts
Social graphs
Fraud detection
Recommendation engines
Data analytics
Data analytics stores provide massively parallel solutions for ingesting, storing, and analyzing data. The data is
distributed across multiple servers to maximize scalability. Large data file formats such as delimiter files (CSV),
parquet, and ORC are widely used in data analytics. Historical data is typically stored in data stores such as blob
storage or Azure Data Lake Storage Gen2, which are then accessed by Azure Synapse, Databricks, or HDInsight
as external tables. A typical scenario using data stored as parquet files for performance, is described in the
article Use external tables with Synapse SQL.
Azure services
Azure Synapse Analytics | (Security Baseline)
Azure Data Lake | (Security Baseline)
Azure Data Explorer | (Security Baseline)
Azure Analysis Services
HDInsight | (Security Baseline)
Azure Databricks | (Security Baseline)
Workload
Data analytics
Enterprise BI
Data type
Historical data from multiple sources.
Usually denormalized in a "star" or "snowflake" schema, consisting of fact and dimension tables.
Usually loaded with new data on a scheduled basis.
Dimension tables often include multiple historic versions of an entity, referred to as a slowly changing
dimension.
Examples
Enterprise data warehouse
Column-family databases
A column-family database organizes data into rows and columns. In its simplest form, a column-family database
can appear very similar to a relational database, at least conceptually. The real power of a column-family
database lies in its denormalized approach to structuring sparse data.
You can think of a column-family database as holding tabular data with rows and columns, but the columns are
divided into groups known as column families. Each column family holds a set of columns that are logically
related together and are typically retrieved or manipulated as a unit. Other data that is accessed separately can
be stored in separate column families. Within a column family, new columns can be added dynamically, and
rows can be sparse (that is, a row doesn't need to have a value for every column).
The following diagram shows an example with two column families, Identity and Contact Info . The data for a
single entity has the same row key in each column-family. This structure, where the rows for any given object in
a column family can vary dynamically, is an important benefit of the column-family approach, making this form
of data store highly suited for storing structured, volatile data.
Unlike a key/value store or a document database, most column-family databases store data in key order, rather
than by computing a hash. Many implementations allow you to create indexes over specific columns in a
column-family. Indexes let you retrieve data by columns value, rather than row key.
Read and write operations for a row are usually atomic with a single column-family, although some
implementations provide atomicity across the entire row, spanning multiple column-families.
Azure services
Azure Cosmos DB Cassandra API | (Security Baseline)
HBase in HDInsight | (Security Baseline)
Workload
Most column-family databases perform write operations extremely quickly.
Update and delete operations are rare.
Designed to provide high throughput and low-latency access.
Supports easy query access to a particular set of fields within a much larger record.
Massively scalable.
Data type
Data is stored in tables consisting of a key column and one or more column families.
Specific columns can vary by individual rows.
Individual cells are accessed via get and put commands
Multiple rows are returned using a scan command.
Examples
Recommendations
Personalization
Sensor data
Telemetry
Messaging
Social media analytics
Web analytics
Activity monitoring
Weather and other time-series data
Object storage
Object storage is optimized for storing and retrieving large binary objects (images, files, video and audio
streams, large application data objects and documents, virtual machine disk images). Large data files are also
popularly used in this model, for example, delimiter file (CSV), parquet, and ORC. Object stores can manage
extremely large amounts of unstructured data.
Azure service
Azure Blob Storage | (Security Baseline)
Azure Data Lake Storage Gen2 | (Security Baseline)
Workload
Identified by key.
Content is typically an asset such as a delimiter, image, or video file.
Content must be durable and external to any application tier.
Data type
Data size is large.
Value is opaque.
Examples
Images, videos, office documents, PDFs
Static HTML, JSON, CSS
Log and audit files
Database backups
Shared files
Sometimes, using simple flat files can be the most effective means of storing and retrieving information. Using
file shares enables files to be accessed across a network. Given appropriate security and concurrent access
control mechanisms, sharing data in this way can enable distributed services to provide highly scalable data
access for performing basic, low-level operations such as simple read and write requests.
Azure service
Azure Files | (Security Baseline)
Workload
Migration from existing apps that interact with the file system.
Requires SMB interface.
Data type
Files in a hierarchical set of folders.
Accessible with standard I/O libraries.
Examples
Legacy files
Shared content accessible among a number of VMs or app instances
Aided with this understanding of different data storage models, the next step is to evaluate your workload and
application, and decide which data store will meet your specific needs. Use the data storage decision tree to help
with this process.
Select an Azure data store for your application
3/10/2022 • 2 minutes to read • Edit Online
Azure offers a number of managed data storage solutions, each providing different features and capabilities.
This article will help you to choose a managed data store for your application.
If your application consists of multiple workloads, evaluate each workload separately. A complete solution may
incorporate multiple data stores.
Select a candidate
Use the following flowchart to select a candidate Azure managed data store.
The output from this flowchart is a star ting point for consideration. Next, perform a more detailed evaluation
of the data store to see if it meets your needs. Refer to Criteria for choosing a data store to aid in this evaluation.
This article describes the comparison criteria you should use when evaluating a data store. The goal is to help
you determine which data storage types can meet your solution's requirements.
General considerations
Keep the following considerations in mind when making your selection.
Functional requirements
Data format . What type of data are you intending to store? Common types include transactional data,
JSON objects, telemetry, search indexes, or flat files.
Data size . How large are the entities you need to store? Will these entities need to be maintained as a
single document, or can they be split across multiple documents, tables, collections, and so forth?
Scale and structure . What is the overall amount of storage capacity you need? Do you anticipate
partitioning your data?
Data relationships . Will your data need to support one-to-many or many-to-many relationships? Are
relationships themselves an important part of the data? Will you need to join or otherwise combine data
from within the same dataset, or from external datasets?
Consistency model . How important is it for updates made in one node to appear in other nodes, before
further changes can be made? Can you accept eventual consistency? Do you need ACID guarantees for
transactions?
Schema flexibility . What kind of schemas will you apply to your data? Will you use a fixed schema, a
schema-on-write approach, or a schema-on-read approach?
Concurrency . What kind of concurrency mechanism do you want to use when updating and
synchronizing data? Will the application perform many updates that could potentially conflict. If so, you
may require record locking and pessimistic concurrency control. Alternatively, can you support optimistic
concurrency controls? If so, is simple timestamp-based concurrency control enough, or do you need the
added functionality of multi-version concurrency control?
Data movement . Will your solution need to perform ETL tasks to move data to other stores or data
warehouses?
Data lifecycle . Is the data write-once, read-many? Can it be moved into cool or cold storage?
Other suppor ted features . Do you need any other specific features, such as schema validation,
aggregation, indexing, full-text search, MapReduce, or other query capabilities?
Non-functional requirements
Performance and scalability . What are your data performance requirements? Do you have specific
requirements for data ingestion rates and data processing rates? What are the acceptable response times
for querying and aggregation of data once ingested? How large will you need the data store to scale up?
Is your workload more read-heavy or write-heavy?
Reliability . What overall SLA do you need to support? What level of fault-tolerance do you need to
provide for data consumers? What kind of backup and restore capabilities do you need?
Replication . Will your data need to be distributed among multiple replicas or regions? What kind of data
replication capabilities do you require?
Limits . Will the limits of a particular data store support your requirements for scale, number of
connections, and throughput?
Management and cost
Managed ser vice . When possible, use a managed data service, unless you require specific capabilities
that can only be found in an IaaS-hosted data store.
Region availability . For managed services, is the service available in all Azure regions? Does your
solution need to be hosted in certain Azure regions?
Por tability . Will your data need to be migrated to on-premises, external datacenters, or other cloud
hosting environments?
Licensing . Do you have a preference of a proprietary versus OSS license type? Are there any other
external restrictions on what type of license you can use?
Overall cost . What is the overall cost of using the service within your solution? How many instances will
need to run, to support your uptime and throughput requirements? Consider operations costs in this
calculation. One reason to prefer managed services is the reduced operational cost.
Cost effectiveness . Can you partition your data, to store it more cost effectively? For example, can you
move large objects out of an expensive relational database into an object store?
Security
Security . What type of encryption do you require? Do you need encryption at rest? What authentication
mechanism do you want to use to connect to your data?
Auditing . What kind of audit log do you need to generate?
Networking requirements . Do you need to restrict or otherwise manage access to your data from
other network resources? Does data need to be accessible only from inside the Azure environment? Does
the data need to be accessible from specific IP addresses or subnets? Does it need to be accessible from
applications or services hosted on-premises or in other external datacenters?
DevOps
Skill set . Are there particular programming languages, operating systems, or other technology that your
team is particularly adept at using? Are there others that would be difficult for your team to work with?
Clients Is there good client support for your development languages?
Choose a big data storage technology in Azure
3/10/2022 • 8 minutes to read • Edit Online
This topic compares options for data storage for big data solutions — specifically, data storage for bulk data
ingestion and batch processing, as opposed to analytical data stores or real-time streaming ingestion.
Azure Cosmos DB
Azure Cosmos DB is Microsoft's globally distributed multi-model database. Cosmos DB guarantees single-digit-
millisecond latencies at the 99th percentile anywhere in the world, offers multiple well-defined consistency
models to fine-tune performance, and guarantees high availability with multi-homing capabilities.
Azure Cosmos DB is schema-agnostic. It automatically indexes all the data without requiring you to deal with
schema and index management. It's also multi-model, natively supporting document, key-value, graph, and
column-family data models.
Azure Cosmos DB features:
Geo-replication
Elastic scaling of throughput and storage worldwide
Five well-defined consistency levels
HBase on HDInsight
Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable.
HBase provides random access and strong consistency for large amounts of unstructured and semi-structured
data in a schemaless database organized by column families.
Data is stored in the rows of a table, and data within a row is grouped by column family. HBase is schemaless in
the sense that neither the columns nor the type of data stored in them need to be defined before using them.
The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data
redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop
ecosystem.
The HDInsight implementation leverages the scale-out architecture of HBase to provide automatic sharding of
tables, strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory
caching for reads and high-throughput streaming for writes. In most cases, you'll want to create the HBase
cluster inside a virtual network so other HDInsight clusters and applications can directly access the tables.
Azure Data Explorer
Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. It helps you
handle the many data streams emitted by modern software so you can collect, store, and analyze data. Azure
Data Explorer is ideal for analyzing large volumes of diverse data from any data source, such as websites,
applications, IoT devices, and more. This data is used for diagnostics, monitoring, reporting, machine learning,
and additional analytics capabilities. Azure Data Explorer makes it simple to ingest this data and enables you to
do complex ad hoc queries on the data in seconds.
Azure Data Explorer can be linearly scaled out for increasing ingestion and query processing throughput. An
Azure Data Explorer cluster can be deployed to a Virtual Network for enabling private networks.
Capability matrix
The following tables summarize the key differences in capabilities.
File storage capabilities
C A PA B IL IT Y A Z URE DATA L A K E STO RE A Z URE B LO B STO RA GE C O N TA IN ERS
Purpose Optimized storage for big data General purpose object store for a
analytics workloads wide variety of storage scenarios
Use cases Batch, streaming analytics, and Any type of text or binary data, such
machine learning data such as log files, as application back end, backup data,
IoT data, click streams, large datasets media storage for streaming, and
general purpose data
Authentication protocol OAuth 2.0. Calls must contain a valid Hash-based message authentication
JWT (JSON web token) issued by Azure code (HMAC). Calls must contain a
Active Directory Base64-encoded SHA-256 hash over a
part of the HTTP request.
C A PA B IL IT Y A Z URE DATA L A K E STO RE A Z URE B LO B STO RA GE C O N TA IN ERS
Authorization POSIX access control lists (ACLs). ACLs For account-level authorization use
based on Azure Active Directory Account Access Keys. For account,
identities can be set file and folder container, or blob authorization use
level. Shared Access Signature Keys.
Developer SDKs .NET, Java, Python, Node.js .NET, Java, Python, Node.js, C++, Ruby
Analytics workload performance Optimized performance for parallel Not optimized for analytics workloads
analytics workloads, High Throughput
and IOPS
Size limits No limits on account sizes, file sizes or Specific limits documented here
number of files
Primary database model Document store, graph, key-value Wide column store
store, wide column store
SQL language support Yes Yes (using the Phoenix JDBC driver)
Pricing model Elastically scalable request units (RUs) Per-minute pricing for HDInsight
charged per-second as needed, cluster (horizontal scaling of nodes),
elastically scalable storage storage
The management of transactional data using computer systems is referred to as online transaction processing
(OLTP). OLTP systems record business interactions as they occur in the day-to-day operation of the organization,
and support querying of this data to make inferences.
Transactional data
Transactional data is information that tracks the interactions related to an organization's activities. These
interactions are typically business transactions, such as payments received from customers, payments made to
suppliers, products moving through inventory, orders taken, or services delivered. Transactional events, which
represent the transactions themselves, typically contain a time dimension, some numerical values, and
references to other data.
Transactions typically need to be atomic and consistent. Atomicity means that an entire transaction always
succeeds or fails as one unit of work, and is never left in a half-completed state. If a transaction cannot be
completed, the database system must roll back any steps that were already done as part of that transaction. In a
traditional RDBMS, this rollback happens automatically if a transaction cannot be completed. Consistency means
that transactions always leave the data in a valid state. (These are very informal descriptions of atomicity and
consistency. There are more formal definitions of these properties, such as ACID.)
Transactional databases can support strong consistency for transactions using various locking strategies, such as
pessimistic locking, to ensure that all data is strongly consistent within the context of the enterprise, for all users
and processes.
The most common deployment architecture that uses transactional data is the data store tier in a 3-tier
architecture. A 3-tier architecture typically consists of a presentation tier, business logic tier, and data store tier. A
related deployment architecture is the N-tier architecture, which may have multiple middle-tiers handling
business logic.
Updateable Yes
REQ UIREM EN T DESC RIP T IO N
Appendable Yes
Model Relational
Challenges
Implementing and using an OLTP system can create a few challenges:
OLTP systems are not always good for handling aggregates over large amounts of data, although there are
exceptions, such as a well-planned SQL Server-based solution. Analytics against the data, that rely on
aggregate calculations over millions of individual transactions, are very resource intensive for an OLTP
system. They can be slow to execute and can cause a slow-down by blocking other transactions in the
database.
When conducting analytics and reporting on data that is highly normalized, the queries tend to be complex,
because most queries need to de-normalize the data by using joins. Also, naming conventions for database
objects in OLTP systems tend to be terse and succinct. The increased normalization coupled with terse
naming conventions makes OLTP systems difficult for business users to query, without the help of a DBA or
data developer.
Storing the history of transactions indefinitely and storing too much data in any one table can lead to slow
query performance, depending on the number of transactions stored. The common solution is to maintain a
relevant window of time (such as the current fiscal year) in the OLTP system and offload historical data to
other systems, such as a data mart or data warehouse.
OLTP in Azure
Applications such as websites hosted in App Service Web Apps, REST APIs running in App Service, or mobile or
desktop applications communicate with the OLTP system, typically via a REST API intermediary.
In practice, most workloads are not purely OLTP. There tends to be an analytical component as well. In addition,
there is an increasing demand for real-time reporting, such as running reports against the operational system.
This is also referred to as HTAP (Hybrid Transactional and Analytical Processing). For more information, see
Online Analytical Processing (OLAP).
In Azure, all of the following data stores will meet the core requirements for OLTP and the management of
transaction data:
Azure SQL Database
SQL Server in an Azure virtual machine
Azure Database for MySQL
Azure Database for PostgreSQL
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L
[1] Not including client driver support, which allows many programming languages to connect to and use the
OLTP data store.
Scalability capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L
Availability capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L
Security capabilities
SQ L SERVER IN A N
A Z URE SQ L A Z URE VIRT UA L A Z URE DATA B A SE A Z URE DATA B A SE
C A PA B IL IT Y DATA B A SE M A C H IN E F O R M Y SQ L F O R P O STGRESQ L
Private IP No Yes No No
Choose a data pipeline orchestration technology in
Azure
3/10/2022 • 2 minutes to read • Edit Online
Most big data solutions consist of repeated data processing operations, encapsulated in workflows. A pipeline
orchestrator is a tool that helps to automate these workflows. An orchestrator can schedule jobs, execute
workflows, and coordinate dependencies among tasks.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T
Management tools Azure Portal, PowerShell, SSMS, PowerShell Bash shell, Oozie REST API,
CLI, .NET SDK Oozie web UI
Pricing Pay per usage Licensing / pay for features No additional charge on top
of running the HDInsight
cluster
Pipeline capabilities
SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T
Spark Yes No No
Scalability capabilities
SQ L SERVER IN T EGRAT IO N
C A PA B IL IT Y A Z URE DATA FA C TO RY SERVIC ES ( SSIS) O O Z IE O N H DIN SIGH T
Scale up Yes No No
This article compares technology choices for search data stores in Azure. A search data store is used to create
and store specialized indexes for performing searches on free-form text. The text that is indexed may reside in a
separate data store, such as blob storage. An application submits a query to the search data store, and the result
is a list of matching documents. For more information about this scenario, see Processing free-form text for
search.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE
Manageability capabilities
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE
Security capabilities
H DIN SIGH T W IT H
C A PA B IL IT Y C O GN IT IVE SEA RC H EL A ST IC SEA RC H SO L R SQ L DATA B A SE
See also
Processing free-form text for search
Transfer data to and from Azure
3/10/2022 • 7 minutes to read • Edit Online
There are several options for transferring data to and from Azure, depending on your needs.
Physical transfer
Using physical hardware to transfer data to Azure is a good option when:
Your network is slow or unreliable.
Getting additional network bandwidth is cost-prohibitive.
Security or organizational policies do not allow outbound connections when dealing with sensitive data.
If your primary concern is how long it will take to transfer your data, you may want to run a test to verify
whether network transfer is actually slower than physical transport.
There are two main options for physically transporting data to Azure:
Azure Impor t/Expor t . The Azure Import/Export service lets you securely transfer large amounts of data
to Azure Blob Storage or Azure Files by shipping internal SATA HDDs or SDDs to an Azure datacenter. You
can also use this service to transfer data from Azure Storage to hard disk drives and have these shipped
to you for loading on-premises.
Azure Data Box . Azure Data Box is a Microsoft-provided appliance that works much like the Azure
Import/Export service. Microsoft ships you a proprietary, secure, and tamper-resistant transfer appliance
and handles the end-to-end logistics, which you can track through the portal. One benefit of the Azure
Data Box service is ease of use. You don't need to purchase several hard drives, prepare them, and
transfer files to each one. Azure Data Box is supported by a number of industry-leading Azure partners to
make it easier to seamlessly use offline transport to the cloud from their products.
Graphical interface
Consider the following options if you are only transferring a few files or data objects and don't need to
automate the process.
Azure Storage Explorer . Azure Storage Explorer is a cross-platform tool that lets you manage the
contents of your Azure storage accounts. It allows you to upload, download, and manage blobs, files,
queues, tables, and Azure Cosmos DB entities. Use it with Blob storage to manage blobs and folders, as
well as upload and download blobs between your local file system and Blob storage, or between storage
accounts.
Azure por tal . Both Blob storage and Data Lake Store provide a web-based interface for exploring files
and uploading new files one at a time. This is a good option if you do not want to install any tools or issue
commands to quickly explore your files, or to simply upload a handful of new ones.
Data pipeline
Azure Data Factor y . Azure Data Factory is a managed service best suited for regularly transferring files
between a number of Azure services, on-premises, or a combination of the two. Using Azure Data Factory, you
can create and schedule data-driven workflows (called pipelines) that ingest data from disparate data stores. It
can process and transform the data by using compute services such as Azure HDInsight Hadoop, Spark, Azure
Data Lake Analytics, and Azure Machine Learning. Create data-driven workflows for orchestrating and
automating data movement and data transformation.
Capability matrix
The following tables summarize the key differences in capabilities.
Physical transfer
C A PA B IL IT Y A Z URE IM P O RT / EXP O RT SERVIC E A Z URE DATA B O X
Form factor Internal SATA HDDs or SDDs Secure, tamper-proof, single hardware
appliance
C A PA B IL IT Y DISTC P SQ O O P H A DO O P C L I
Other :
Copy to No No No No Yes
relational
database
C A PA B IL IT Y A Z URE C L I AZC OPY P O W ERSH EL L A DL C O P Y P O LY B A SE
[1] AdlCopy is optimized for transferring big data when used with a Data Lake Analytics account.
[2] PolyBase performance can be increased by pushing computation to Hadoop and using PolyBase scale-out
groups to enable parallel data transfer between SQL Server instances and Hadoop nodes.
Graphical interface and Azure Data Factory
C A PA B IL IT Y A Z URE STO RA GE EXP LO RER A Z URE P O RTA L * A Z URE DATA FA C TO RY
* Azure portal in this case means using the web-based exploration tools for Blob storage and Data Lake Store.
Choose an analytical data store in Azure
3/10/2022 • 5 minutes to read • Edit Online
In a big data architecture, there is often a need for an analytical data store that serves processed data in a
structured format that can be queried using analytical tools. Analytical data stores that support querying of both
hot-path and cold-path data are collectively referred to as the serving layer, or data serving storage.
The serving layer deals with processed data from both the hot path and cold path. In the lambda architecture,
the serving layer is subdivided into a speed serving layer, which stores data that has been processed
incrementally, and a batch serving layer, which contains the batch-processed output. The serving layer requires
strong support for random reads with low latency. Data storage for the speed layer should also support random
writes, because batch loading data into this store would introduce undesired delays. On the other hand, data
storage for the batch layer does not need to support random writes, but batch writes instead.
There is no single best data management choice for all data storage tasks. Different data management solutions
are optimized for different tasks. Most real-world cloud apps and big data processes have a variety of data
storage requirements and often use a combination of data storage solutions.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
H B A SE/ P
A Z URE A Z URE A Z URE H O EN IX H IVE
SQ L SY N A P SE SY N A P SE DATA ON LLAP ON A Z URE
C A PA B IL I DATA B A S SQ L SPA RK EXP LO RE H DIN SIG H DIN SIG A N A LY SIS C O SM O S
TY E POOL POOL R HT HT SERVIC ES DB
Security capabilities
A Z URE H B A SE/ P H H IVE L L A P A Z URE
C A PA B IL IT SQ L A Z URE DATA O EN IX O N ON A N A LY SIS C O SM O S
Y DATA B A SE SY N A P SE EXP LO RER H DIN SIGH T H DIN SIGH T SERVIC ES DB
The goal of most big data solutions is to provide insights into the data through analysis and reporting. This can
include preconfigured reports and visualizations, or interactive data exploration.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
JUP Y T ER Z EP P EL IN M IC RO SO F T A Z URE
C A PA B IL IT Y P O W ER B I N OT EB O O K S N OT EB O O K S N OT EB O O K S
Embedding Yes No No No
capabilities
Big data solutions often use long-running batch jobs to filter, aggregate, and otherwise prepare the data for
analysis. Usually these jobs involve reading source files from scalable storage (like HDFS, Azure Data Lake Store,
and Azure Storage), processing them, and writing the output to new files in scalable storage.
The key requirement of such batch processing engines is the ability to scale out computations, in order to
handle a large volume of data. Unlike real-time processing, however, batch processing is expected to have
latencies (the time between data ingestion and computing a result) that measure in minutes to hours.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
A Z URE DATA L A K E
C A PA B IL IT Y A N A LY T IC S A Z URE SY N A P SE H DIN SIGH T A Z URE DATA B RIC K S
Pricing model Per batch job By cluster hour By cluster hour Databricks Unit2 +
cluster hour
Scale-out Per job Per cluster Per cluster Per cluster Per cluster Per cluster
granularity
Next steps
Analytics architecture design
Choose an analytical data store in Azure
Choose a data analytics technology in Azure
Analytics end-to-end with Azure Synapse
Choose a stream processing technology in Azure
3/10/2022 • 2 minutes to read • Edit Online
This article compares technology choices for real-time stream processing in Azure.
Real-time stream processing consumes messages from either queue or file-based storage, processes the
messages, and forwards the result to another message queue, file store, or database. Processing may include
querying, filtering, and aggregating messages. Stream processing engines must be able to consume endless
streams of data and produce results with minimal latency. For more information, see Real time processing.
Capability matrix
The following tables summarize the key differences in capabilities.
General capabilities
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S
Programmabil Stream C#/F#, Java, C#/F#, Java, C#, Java C#, F#, Java, C#, Java,
ity analytics Python, Scala Python, R, Node.js, Node.js, PHP,
query Scala Python Python
language,
JavaScript
Pricing model Streaming Per cluster Databricks Per cluster Per function Per app
units hour units hour execution and service plan
resource hour
consumption
Integration capabilities
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S
Inputs Azure Event Event Hubs, Event Hubs, Event Hubs, Supported Service Bus,
Hubs, Azure IoT Hub, IoT Hub, IoT Hub, bindings Storage
IoT Hub, Kafka, HDFS, Kafka, HDFS, Storage Blobs, Queues,
Azure Blob Storage Blobs, Storage Blobs, Azure Data Storage Blobs,
storage Azure Data Azure Data Lake Store Event Hubs,
Lake Store Lake Store WebHooks,
Cosmos DB,
Files
Sinks Azure Data HDFS, Kafka, HDFS, Kafka, Event Hubs, Supported Service Bus,
Lake Store, Storage Blobs, Storage Blobs, Service Bus, bindings Storage
Azure SQL Azure Data Azure Data Kafka Queues,
Database, Lake Store, Lake Store, Storage Blobs,
Storage Blobs, Cosmos DB Cosmos DB Event Hubs,
Event Hubs, WebHooks,
Power BI, Cosmos DB,
Table Storage, Files
Service Bus
Queues,
Service Bus
Topics,
Cosmos DB,
Azure
Functions
Processing capabilities
A PA C H E
A Z URE H DIN SIGH T SPA RK IN A Z URE A P P
ST REA M W IT H SPA RK A Z URE H DIN SIGH T A Z URE SERVIC E
C A PA B IL IT Y A N A LY T IC S ST REA M IN G DATA B RIC K S W IT H STO RM F UN C T IO N S W EB JO B S
Input data Avro, JSON or Any format Any format Any format Any format Any format
formats CSV, UTF-8 using custom using custom using custom using custom using custom
encoded code code code code code
See also:
Choosing a real-time message ingestion technology
Real time processing
Choose a Microsoft cognitive services technology
3/10/2022 • 3 minutes to read • Edit Online
Microsoft cognitive services are cloud-based APIs that you can use in artificial intelligence (AI) applications and
data flows. They provide you with pretrained models that are ready to use in your application, requiring no data
and no model training on your part. The cognitive services are developed by Microsoft's AI and Research team
and leverage the latest deep learning algorithms. They are consumed over HTTP REST interfaces. In addition,
SDKs are available for many common application development frameworks.
The cognitive services include:
Text analysis
Computer vision
Video analytics
Speech recognition and generation
Natural language understanding
Intelligent search
Key benefits:
Minimal development effort for state-of-the-art AI services.
Easy integration into apps via HTTP REST interfaces.
Built-in support for consuming cognitive services in Azure Data Lake Analytics.
Considerations:
Only available over the web. Internet connectivity is generally required. An exception is the Custom Vision
Service, whose trained model you can export for prediction on devices and at the IoT edge.
Although considerable customization is supported, the available services may not suit all predictive
analytics requirements.
Capability matrix
The following tables summarize the key differences in capabilities.
Uses prebuilt models
C A PA B IL IT Y IN P UT T Y P E K EY B EN EF IT
Entity Linking API Text Power your app's data links with
named entity recognition and
disambiguation.
Bing Spell Check API Text Detect and correct spelling mistakes in
your app.
Bing Entity Search API Text (web search query) Identify and augment entity
information from the web.
Bing Image Search API Text (web search query) Search for images.
Bing News Search API Text (web search query) Search for news.
C A PA B IL IT Y IN P UT T Y P E K EY B EN EF IT
Bing Video Search API Text (web search query) Search for videos.
Bing Web Search API Text (web search query) Get enhanced search details from
billions of web documents.
Bing Speech API Text or Speech Convert speech to text and back again.
Computer Vision API Images (or frames from video) Distill actionable information from
images, automatically create
description of photos, derive tags,
recognize celebrities, extract text, and
create accurate thumbnails.
Content Moderator Text, Images or Video Automated image, text, and video
moderation.
Emotion API Images (photos with human subjects) Identify the range emotions of human
subjects.
Face API Images (photos with human subjects) Detect, identify, analyze, organize, and
tag faces in photos.
Custom Vision Service Images (or frames from video) Customize your own computer vision
models.
Custom Decision Service Web content (for example, RSS feed) Use machine learning to automatically
select the appropriate content for your
home page
Bing Custom Search API Text (web search query) Commercial-grade search tool.
Compare the machine learning products and
technologies from Microsoft
3/10/2022 • 8 minutes to read • Edit Online
Learn about the machine learning products and technologies from Microsoft. Compare options to help you
choose how to most effectively build, deploy, and manage your machine learning solutions.
C LO UD O P T IO N S W H AT IT IS W H AT Y O U C A N DO W IT H IT
Azure Machine Learning Managed platform for machine Use a pretrained model. Or, train,
learning deploy, and manage models on Azure
using Python and CLI
Azure Cognitive Services Pre-built AI capabilities implemented Build intelligent applications quickly
through REST APIs and SDKs using standard programming
languages. Doesn't require machine
learning and data science expertise
Azure SQL Managed Instance Machine In-database machine learning for SQL Train and deploy models inside Azure
Learning Services SQL Managed Instance
Machine learning in Azure Synapse Analytics service with machine learning Train and deploy models inside Azure
Analytics Synapse Analytics
Machine learning and AI with ONNX in Machine learning in SQL on IoT Train and deploy models inside Azure
Azure SQL Edge SQL Edge
Azure Databricks Apache Spark-based analytics platform Build and deploy models and data
workflows using integrations with
open-source machine learning libraries
and the MLFlow platform.
O N - P REM ISES O P T IO N S W H AT IT IS W H AT Y O U C A N DO W IT H IT
SQL Server Machine Learning Services In-database machine learning for SQL Train and deploy models inside SQL
Server
Machine Learning Services on SQL Machine learning in Big Data Clusters Train and deploy models on SQL
Server Big Data Clusters Server Big Data Clusters
P L AT F O RM S/ TO O L S W H AT IT IS W H AT Y O U C A N DO W IT H IT
Azure Data Science Virtual Machine Virtual machine with pre-installed data Develop machine learning solutions in
science tools a pre-configured environment
Machine Learning extension for Azure Open-source and cross-platform Manage packages, import machine
Data Studio machine learning extension for Azure learning models, make predictions, and
Data Studio create notebooks to run experiments
for your SQL databases
Key benefits Code first (SDK) and studio & drag-and-drop designer web
interface authoring options.
Suppor ted languages Various options depending on the service. Standard ones are
C#, Java, JavaScript, and Python.
Azure Databricks
Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services
platform. Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an
interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Use Python, R, Scala, and SQL code in web-based notebooks to query, visualize, and model data.
Use Databricks when you want to collaborate on building machine learning solutions on Apache Spark.
ML.NET
ML.NET is an open-source, and cross-platform machine learning framework. With ML.NET, you can build custom
machine learning solutions and integrate them into your .NET applications. ML.NET offers varying levels of
interoperability with popular frameworks like TensorFlow and ONNX for training and scoring machine learning
and deep learning models. For resource-intensive tasks like training image classification models, you can take
advantage of Azure to train your models in the cloud.
Use ML.NET when you want to integrate machine learning solutions into your .NET applications. Choose
between the API for a code-first experience and Model Builder or the CLI for a low-code experience.
Windows ML
Windows ML inference engine allows you to use trained machine learning models in your applications,
evaluating trained models locally on Windows 10 devices.
Use Windows ML when you want to use trained machine learning models within your Windows applications.
MMLSpark
Microsoft ML for Apache Spark (MMLSpark) is an open-source library that expands the distributed computing
framework Apache Spark. MMLSpark adds many deep learning and data science tools to the Spark ecosystem,
including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK),
LightGBM, LIME (Model Interpretability), and OpenCV. You can use these tools to create powerful predictive
models on any Spark cluster, such as Azure Databricks or Cosmic Spark.
MMLSpark also brings new networking capabilities to the Spark ecosystem. With the HTTP on Spark project,
users can embed any web service into their SparkML models. Additionally, MMLSpark provides easy-to-use
tools for orchestrating Azure Cognitive Services at scale. For production-grade deployment, the Spark Serving
project enables high throughput, submillisecond latency web services, backed by your Spark cluster.
Next steps
To learn about all the Artificial Intelligence (AI) development products available from Microsoft, see Microsoft
AI platform.
For training in developing AI and Machine Learning solutions with Microsoft, see Microsoft Learn.
Choose a real-time message ingestion technology
in Azure
3/10/2022 • 2 minutes to read • Edit Online
Real time processing deals with streams of data that are captured in real-time and processed with minimal
latency. Many real-time processing solutions need a message ingestion store to act as a buffer for messages,
and to support scale-out processing, reliable delivery, and other message queuing semantics.
Kafka on HDInsight
Apache Kafka is an open-source distributed streaming platform that can be used to build real-time data
pipelines and streaming applications. Kafka also provides message broker functionality similar to a message
queue, where you can publish and subscribe to named data streams. It is horizontally scalable, fault-tolerant, and
extremely fast. Kafka on HDInsight provides a Kafka as a managed, highly scalable, and highly available service
in Azure.
Some common use cases for Kafka are:
Messaging . Because it supports the publish-subscribe message pattern, Kafka is often used as a message
broker.
Activity tracking . Because Kafka provides in-order logging of records, it can be used to track and re-create
activities, such as user actions on a web site.
Aggregation . Using stream processing, you can aggregate information from different streams to combine
and centralize the information into operational data.
Transformation . Using stream processing, you can combine and enrich data from multiple input topics into
one or more output topics.
Capability matrix
The following tables summarize the key differences in capabilities.
Cloud-to-device Yes No No
communications
Protocol support MQTT, AMQP, HTTPS 1 AMQP, HTTPS, Kafka Kafka Protocol
Protocol
[1] You can also use Azure IoT protocol gateway as a custom gateway to enable protocol adaptation for IoT Hub.
For more information, see Comparison of Azure IoT Hub and Azure Event Hubs.
Best practices in cloud applications
3/10/2022 • 3 minutes to read • Edit Online
These best practices can help you build reliable, scalable, and secure applications in the cloud. They offer
guidelines and tips for designing and implementing efficient and robust systems, mechanisms, and approaches.
Many also include code examples that you can use with Azure services. The practices apply to any distributed
system, whether your host is Azure or a different cloud platform.
Catalog of practices
This table lists various best practices. The Related pillars or patterns column contains the following links:
Cloud development challenges that the practice and related design patterns address.
Pillars of the Microsoft Azure Well-Architected Framework that the practice focuses on.
API design Design web APIs to support platform Design and implementation,
independence by using standard Performance efficiency, Operational
protocols and agreed-upon data excellence
formats. Promote service evolution so
that clients can discover functionality
without requiring modification.
Improve response times and prevent
transient faults by supporting partial
responses and providing ways to filter
and paginate data.
Content delivery network Use content delivery networks (CDNs) Data management, Performance
to efficiently deliver web content to efficiency
users and reduce load on web apps.
Overcome deployment, versioning,
security, and resilience challenges.
Data partitioning strategies (by Partition data in Azure SQL Database Data management, Performance
service) and Azure Storage services like Azure efficiency, Cost optimization
Table Storage and Azure Blob Storage.
Shard your data to distribute loads,
reduce latency, and support horizontal
scaling.
Host name preservation Learn why it's important to preserve Design and implementation, Reliability
the original HTTP host name between
a reverse proxy and its back-end web
application, and how to implement this
recommendation for the most
common Azure services.
Monitoring and diagnostics Track system health, usage, and Operational excellence
performance with a monitoring and
diagnostics pipeline. Turn monitoring
data into alerts, reports, and triggers
that help in various situations.
Examples include detecting and
correcting issues, spotting potential
problems, meeting performance
guarantees, and fulfilling auditing
requirements.
Retry guidance for specific services Use, adapt, and extend the retry Design and implementation, Reliability
mechanisms that Azure services and
client SDKs offer. Develop a systematic
and robust approach for managing
temporary issues with connections,
operations, and resources.
P RA C T IC E SUM M A RY REL AT ED P IL L A RS O R PAT T ERN S
Transient fault handling Handle transient faults caused by Design and implementation, Reliability
unavailable networks or resources.
Overcome challenges when developing
appropriate retry strategies. Avoid
duplicating layers of retry code and
other anti-patterns.
Next steps
Web API design
Web API implementation
Related resources
Cloud design patterns
Microsoft Azure Well-Architected Framework
RESTful web API design
3/10/2022 • 28 minutes to read • Edit Online
Most modern web applications expose APIs that clients can use to interact with the application. A well-designed
web API should aim to support:
Platform independence . Any client should be able to call the API, regardless of how the API is
implemented internally. This requires using standard protocols, and having a mechanism whereby the
client and the web service can agree on the format of the data to exchange.
Ser vice evolution . The web API should be able to evolve and add functionality independently from
client applications. As the API evolves, existing client applications should continue to function without
modification. All functionality should be discoverable so that client applications can fully use it.
This guidance describes issues that you should consider when designing a web API.
What is REST?
In 2000, Roy Fielding proposed Representational State Transfer (REST) as an architectural approach to designing
web services. REST is an architectural style for building distributed systems based on hypermedia. REST is
independent of any underlying protocol and is not necessarily tied to HTTP. However, most common REST API
implementations use HTTP as the application protocol, and this guide focuses on designing REST APIs for HTTP.
A primary advantage of REST over HTTP is that it uses open standards, and does not bind the implementation of
the API or the client applications to any specific implementation. For example, a REST web service could be
written in ASP.NET, and client applications can use any language or toolset that can generate HTTP requests and
parse HTTP responses.
Here are some of the main design principles of RESTful APIs using HTTP:
REST APIs are designed around resources, which are any kind of object, data, or service that can be
accessed by the client.
A resource has an identifier, which is a URI that uniquely identifies that resource. For example, the URI for
a particular customer order might be:
https://adventure-works.com/orders/1
Clients interact with a service by exchanging representations of resources. Many web APIs use JSON as
the exchange format. For example, a GET request to the URI listed above might return this response body:
{"orderId":1,"orderValue":99.90,"productId":1,"quantity":1}
REST APIs use a uniform interface, which helps to decouple the client and service implementations. For
REST APIs built on HTTP, the uniform interface includes using standard HTTP verbs to perform operations
on resources. The most common operations are GET, POST, PUT, PATCH, and DELETE.
REST APIs use a stateless request model. HTTP requests should be independent and may occur in any
order, so keeping transient state information between requests is not feasible. The only place where
information is stored is in the resources themselves, and each request should be an atomic operation.
This constraint enables web services to be highly scalable, because there is no need to retain any affinity