0% found this document useful (0 votes)
85 views92 pages

Azure Architecture

Uploaded by

hanuman sqlboy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views92 pages

Azure Architecture

Uploaded by

hanuman sqlboy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Connect an on-premises network to Azure

This article compares options for connecting an on-premises network to an Azure Virtual
Network (VNet). For each option, a more detailed reference architecture is available.

VPN connection
A VPN gateway is a type of virtual network gateway that sends encrypted traffic between an
Azure virtual network and an on-premises location. The encrypted traffic goes over the public
Internet.

This architecture is suitable for hybrid applications where the traffic between on-premises
hardware and the cloud is likely to be light, or you are willing to trade slightly extended
latency for the flexibility and processing power of the cloud.

Benefits

 Simple to configure.

Challenges

 Requires an on-premises VPN device.


 Although Microsoft guarantees 99.9% availability for each VPN Gateway, this SLA
only covers the VPN gateway, and not your network connection to the gateway.
 A VPN connection over Azure VPN Gateway currently supports a maximum of 200
Mbps bandwidth. You may need to partition your Azure virtual network across
multiple VPN connections if you expect to exceed this throughput.

Azure ExpressRoute connection


ExpressRoute connections use a private, dedicated connection through a third-party
connectivity provider. The private connection extends your on-premises network into Azure.

This architecture is suitable for hybrid applications running large-scale, mission-critical


workloads that require a high degree of scalability.

Benefits

 Much higher bandwidth available; up to 10 Gbps depending on the connectivity


provider.
 Supports dynamic scaling of bandwidth to help reduce costs during periods of lower
demand. However, not all connectivity providers have this option.
 May allow your organization direct access to national clouds, depending on the
connectivity provider.
 99.9% availability SLA across the entire connection.
Challenges

 Can be complex to set up. Creating an ExpressRoute connection requires working


with a third-party connectivity provider. The provider is responsible for provisioning
the network connection.
 Requires high-bandwidth routers on-premises.

ExpressRoute with VPN failover


This options combines the previous two, using ExpressRoute in normal conditions, but failing
over to a VPN connection if there is a loss of connectivity in the ExpressRoute circuit.

This architecture is suitable for hybrid applications that need the higher bandwidth of
ExpressRoute, and also require highly available network connectivity.

Benefits

 High availability if the ExpressRoute circuit fails, although the fallback connection is
on a lower bandwidth network.

Challenges

 Complex to configure. You need to set up both a VPN connection and an


ExpressRoute circuit.
 Requires redundant hardware (VPN appliances), and a redundant Azure VPN
Gateway connection for which you pay charges.

Hub-spoke network topology


A hub-spoke network topology is a way to isolate workloads while sharing services such as
identity and security. The hub is a virtual network (VNet) in Azure that acts as a central point
of connectivity to your on-premises network. The spokes are VNets that peer with the hub.
Shared services are deployed in the hub, while individual workloads are deployed as spokes.
Connect an on-premises network to Azure
using a VPN gateway
This reference architecture shows how to extend an on-premises network to Azure, using a
site-to-site virtual private network (VPN). Traffic flows between the on-premises network
and an Azure Virtual Network (VNet) through an IPSec VPN tunnel.

Architecture
The architecture consists of the following components.

 On-premises network. A private local-area network running within an organization.


 VPN appliance. A device or service that provides external connectivity to the on-
premises network. The VPN appliance may be a hardware device, or it can be a
software solution such as the Routing and Remote Access Service (RRAS) in
Windows Server 2012. For a list of supported VPN appliances and information on
configuring them to connect to an Azure VPN gateway, see the instructions for the
selected device in the article About VPN devices for Site-to-Site VPN Gateway
connections.
 Virtual network (VNet). The cloud application and the components for the Azure
VPN gateway reside in the same VNet.
 Azure VPN gateway. The VPN gateway service enables you to connect the VNet to
the on-premises network through a VPN appliance. For more information, see
Connect an on-premises network to a Microsoft Azure virtual network. The VPN
gateway includes the following elements:
o Virtual network gateway. A resource that provides a virtual VPN appliance for the
VNet. It is responsible for routing traffic from the on-premises network to the VNet.
o Local network gateway. An abstraction of the on-premises VPN appliance. Network
traffic from the cloud application to the on-premises network is routed through this
gateway.
o Connection. The connection has properties that specify the connection type (IPSec)
and the key shared with the on-premises VPN appliance to encrypt traffic.
o Gateway subnet. The virtual network gateway is held in its own subnet, which is
subject to various requirements, described in the Recommendations section below.

 Cloud application. The application hosted in Azure. It might include multiple tiers,
with multiple subnets connected through Azure load balancers. For more information
about the application infrastructure, see Running Windows VM workloads and
Running Linux VM workloads.
 Internal load balancer. Network traffic from the VPN gateway is routed to the cloud
application through an internal load balancer. The load balancer is located in the
front-end subnet of the application.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

VNet and gateway subnet

Create an Azure VNet with an address space large enough for all of your required resources.
Ensure that the VNet address space has sufficient room for growth if additional VMs are
likely to be needed in the future. The address space of the VNet must not overlap with the on-
premises network. For example, the diagram above uses the address space 10.20.0.0/16 for
the VNet.

Create a subnet named GatewaySubnet, with an address range of /27. This subnet is required
by the virtual network gateway. Allocating 32 addresses to this subnet will help to prevent
reaching gateway size limitations in the future. Also, avoid placing this subnet in the middle
of the address space. A good practice is to set the address space for the gateway subnet at the
upper end of the VNet address space. The example shown in the diagram uses
10.20.255.224/27. Here is a quick procedure to calculate the CIDR:

1. Set the variable bits in the address space of the VNet to 1, up to the bits being used by the
gateway subnet, then set the remaining bits to 0.
2. Convert the resulting bits to decimal and express it as an address space with the prefix
length set to the size of the gateway subnet.

For example, for a VNet with an IP address range of 10.20.0.0/16, applying step #1 above
becomes 10.20.0b11111111.0b11100000. Converting that to decimal and expressing it as an
address space yields 10.20.255.224/27.

Warning

Do not deploy any VMs to the gateway subnet. Also, do not assign an NSG to this subnet, as
it will cause the gateway to stop functioning.

Virtual network gateway

Allocate a public IP address for the virtual network gateway.


Create the virtual network gateway in the gateway subnet and assign it the newly allocated
public IP address. Use the gateway type that most closely matches your requirements and that
is enabled by your VPN appliance:

 Create a policy-based gateway if you need to closely control how requests are routed
based on policy criteria such as address prefixes. Policy-based gateways use static
routing, and only work with site-to-site connections.
 Create a route-based gateway if you connect to the on-premises network using RRAS,
support multi-site or cross-region connections, or implement VNet-to-VNet
connections (including routes that traverse multiple VNets). Route-based gateways
use dynamic routing to direct traffic between networks. They can tolerate failures in
the network path better than static routes because they can try alternative routes.
Route-based gateways can also reduce the management overhead because routes
might not need to be updated manually when network addresses change.

For a list of supported VPN appliances, see About VPN devices for Site-to-Site VPN
Gateway connections.

Note

After the gateway has been created, you cannot change between gateway types without
deleting and re-creating the gateway.

Select the Azure VPN gateway SKU that most closely matches your throughput
requirements. For more informayion, see Gateway SKUs

Note

The Basic SKU is not compatible with Azure ExpressRoute. You can change the SKU after
the gateway has been created.

You are charged based on the amount of time that the gateway is provisioned and available.
See VPN Gateway Pricing.

Create routing rules for the gateway subnet that direct incoming application traffic from the
gateway to the internal load balancer, rather than allowing requests to pass directly to the
application VMs.

On-premises network connection

Create a local network gateway. Specify the public IP address of the on-premises VPN
appliance, and the address space of the on-premises network. Note that the on-premises VPN
appliance must have a public IP address that can be accessed by the local network gateway in
Azure VPN Gateway. The VPN device cannot be located behind a network address
translation (NAT) device.

Create a site-to-site connection for the virtual network gateway and the local network
gateway. Select the site-to-site (IPSec) connection type, and specify the shared key. Site-to-
site encryption with the Azure VPN gateway is based on the IPSec protocol, using preshared
keys for authentication. You specify the key when you create the Azure VPN gateway. You
must configure the VPN appliance running on-premises with the same key. Other
authentication mechanisms are not currently supported.

Ensure that the on-premises routing infrastructure is configured to forward requests intended
for addresses in the Azure VNet to the VPN device.

Open any ports required by the cloud application in the on-premises network.

Test the connection to verify that:

 The on-premises VPN appliance correctly routes traffic to the cloud application through the
Azure VPN gateway.
 The VNet correctly routes traffic back to the on-premises network.
 Prohibited traffic in both directions is blocked correctly.

Scalability considerations
You can achieve limited vertical scalability by moving from the Basic or Standard VPN
Gateway SKUs to the High Performance VPN SKU.

For VNets that expect a large volume of VPN traffic, consider distributing the different
workloads into separate smaller VNets and configuring a VPN gateway for each of them.

You can partition the VNet either horizontally or vertically. To partition horizontally, move
some VM instances from each tier into subnets of the new VNet. The result is that each VNet
has the same structure and functionality. To partition vertically, redesign each tier to divide
the functionality into different logical areas (such as handling orders, invoicing, customer
account management, and so on). Each functional area can then be placed in its own VNet.

Replicating an on-premises Active Directory domain controller in the VNet, and


implementing DNS in the VNet, can help to reduce some of the security-related and
administrative traffic flowing from on-premises to the cloud. For more information, see
Extending Active Directory Domain Services (AD DS) to Azure.

Availability considerations
If you need to ensure that the on-premises network remains available to the Azure VPN
gateway, implement a failover cluster for the on-premises VPN gateway.

If your organization has multiple on-premises sites, create multi-site connections to one or
more Azure VNets. This approach requires dynamic (route-based) routing, so make sure that
the on-premises VPN gateway supports this feature.

For details about service level agreements, see SLA for VPN Gateway.

Manageability considerations
Monitor diagnostic information from on-premises VPN appliances. This process depends on
the features provided by the VPN appliance. For example, if you are using the Routing and
Remote Access Service on Windows Server 2012, RRAS logging.

Use Azure VPN gateway diagnostics to capture information about connectivity issues. These
logs can be used to track information such as the source and destinations of connection
requests, which protocol was used, and how the connection was established (or why the
attempt failed).

Monitor the operational logs of the Azure VPN gateway using the audit logs available in the
Azure portal. Separate logs are available for the local network gateway, the Azure network
gateway, and the connection. This information can be used to track any changes made to the
gateway, and can be useful if a previously functioning gateway stops working for some
reason.

Monitor connectivity, and track connectivity failure events. You can use a monitoring
package such as Nagios to capture and report this information.

Security considerations
Generate a different shared key for each VPN gateway. Use a strong shared key to help resist
brute-force attacks.

Note

Currently, you cannot use Azure Key Vault to preshare keys for the Azure VPN gateway.

Ensure that the on-premises VPN appliance uses an encryption method that is compatible
with the Azure VPN gateway. For policy-based routing, the Azure VPN gateway supports the
AES256, AES128, and 3DES encryption algorithms. Route-based gateways support AES256
and 3DES.

If your on-premises VPN appliance is on a perimeter network (DMZ) that has a firewall
between the perimeter network and the Internet, you might have to configure additional
firewall rules to allow the site-to-site VPN connection.

If the application in the VNet sends data to the Internet, consider implementing forced
tunneling to route all Internet-bound traffic through the on-premises network. This approach
enables you to audit outgoing requests made by the application from the on-premises
infrastructure.

Note

Forced tunneling can impact connectivity to Azure services (the Storage Service, for
example) and the Windows license manager.
Troubleshooting
For general information on troubleshooting common VPN-related errors, see Troubleshooting
common VPN related errors.

The following recommendations are useful for determining if your on-premises VPN
appliance is functioning correctly.

 Check any log files generated by the VPN appliance for errors or failures.

This will help you determine if the VPN appliance is functioning correctly. The
location of this information will vary according to your appliance. For example, if you
are using RRAS on Windows Server 2012, you can use the following PowerShell
command to display error event information for the RRAS service:

PowerShell

Get-EventLog -LogName System -EntryType Error -Source RemoteAccess |


Format-List -Property *

The Message property of each entry provides a description of the error. Some common
examples are:

- Inability to connect, possibly due to an incorrect IP address specified


for the Azure VPN gateway in the RRAS VPN network interface configuration.

```
EventID : 20111
MachineName : on-prem-vm
Data : {41, 3, 0, 0}
Index : 14231
Category : (0)
CategoryNumber : 0
EntryType : Error
Message : RoutingDomainID- {00000000-0000-0000-0000-
000000000000}: A demand dial connection to the remote
interface AzureGateway on port VPN2-4 was
successfully initiated but failed to complete
successfully because of the following error: The
network connection between your computer and
the VPN server could not be established because the
remote server is not responding. This could
be because one of the network devices (for example,
firewalls, NAT, routers, and so on) between your computer
and the remote server is not configured to allow VPN
connections. Please contact your
Administrator or your service provider to determine
which device may be causing the problem.
Source : RemoteAccess
ReplacementStrings : {{00000000-0000-0000-0000-000000000000},
AzureGateway, VPN2-4, The network connection between
your computer and the VPN server could not be
established because the remote server is not
responding. This could be because one of the network
devices (for example, firewalls, NAT, routers, and so on)
between your computer and the remote server is not
configured to allow VPN connections. Please
contact your Administrator or your service provider
to determine which device may be causing the
problem.}
InstanceId : 20111
TimeGenerated : 3/18/2016 1:26:02 PM
TimeWritten : 3/18/2016 1:26:02 PM
UserName :
Site :
Container :
```

- The wrong shared key being specified in the RRAS VPN network interface
configuration.

```
EventID : 20111
MachineName : on-prem-vm
Data : {233, 53, 0, 0}
Index : 14245
Category : (0)
CategoryNumber : 0
EntryType : Error
Message : RoutingDomainID- {00000000-0000-0000-0000-
000000000000}: A demand dial connection to the remote
interface AzureGateway on port VPN2-4 was
successfully initiated but failed to complete
successfully because of the following error:
Internet key exchange (IKE) authentication credentials are unacceptable.

Source : RemoteAccess
ReplacementStrings : {{00000000-0000-0000-0000-000000000000},
AzureGateway, VPN2-4, IKE authentication credentials are
unacceptable.
}
InstanceId : 20111
TimeGenerated : 3/18/2016 1:34:22 PM
TimeWritten : 3/18/2016 1:34:22 PM
UserName :
Site :
Container :
```

You can also obtain event log information about attempts to connect through the RRAS
service using the following PowerShell command:

Get-EventLog -LogName Application -Source RasClient | Format-List -Property


*

In the event of a failure to connect, this log will contain errors that look similar to the
following:

 EventID : 20227
MachineName : on-prem-vm
Data : {}
Index : 4203
Category : (0)
CategoryNumber : 0
EntryType : Error
Message : CoId={B4000371-A67F-452F-AA4C-3125AA9CFC78}: The user
SYSTEM dialed a connection named
AzureGateway that has failed. The error code returned
on failure is 809.
Source : RasClient
ReplacementStrings : {{B4000371-A67F-452F-AA4C-3125AA9CFC78}, SYSTEM,
AzureGateway, 809}
InstanceId : 20227
TimeGenerated : 3/18/2016 1:29:21 PM
TimeWritten : 3/18/2016 1:29:21 PM
UserName :
Site :
Container :

 Verify connectivity and routing across the VPN gateway.

The VPN appliance may not be correctly routing traffic through the Azure VPN Gateway.
Use a tool such as PsPing to verify connectivity and routing across the VPN gateway. For
example, to test connectivity from an on-premises machine to a web server located on the
VNet, run the following command (replacing <<web-server-address>> with the address of
the web server):

PsPing -t <<web-server-address>>:80

If the on-premises machine can route traffic to the web server, you should see output similar
to the following:

D:\PSTools>psping -t 10.20.0.5:80

PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility


Copyright (C) 2012-2014 Mark Russinovich
Sysinternals - www.sysinternals.com

TCP connect to 10.20.0.5:80:


Infinite iterations (warmup 1) connecting test:
Connecting to 10.20.0.5:80 (warmup): 6.21ms
Connecting to 10.20.0.5:80: 3.79ms
Connecting to 10.20.0.5:80: 3.44ms
Connecting to 10.20.0.5:80: 4.81ms

Sent = 3, Received = 3, Lost = 0 (0% loss),


Minimum = 3.44ms, Maximum = 4.81ms, Average = 4.01ms

If the on-premises machine cannot communicate with the specified destination, you will see
messages like this:

 D:\PSTools>psping -t 10.20.1.6:80

 PsPing v2.01 - PsPing - ping, latency, bandwidth measurement utility
 Copyright (C) 2012-2014 Mark Russinovich
 Sysinternals - www.sysinternals.com

 TCP connect to 10.20.1.6:80:
 Infinite iterations (warmup 1) connecting test:
 Connecting to 10.20.1.6:80 (warmup): This operation returned because
the timeout period expired.
 Connecting to 10.20.1.6:80: This operation returned because the
timeout period expired.
 Connecting to 10.20.1.6:80: This operation returned because the
timeout period expired.
 Connecting to 10.20.1.6:80: This operation returned because the
timeout period expired.
 Connecting to 10.20.1.6:80:
 Sent = 3, Received = 0, Lost = 3 (100% loss),
 Minimum = 0.00ms, Maximum = 0.00ms, Average = 0.00ms

 Verify that the on-premises firewall allows VPN traffic to pass and that the
correct ports are opened.
 Verify that the on-premises VPN appliance uses an encryption method that is
compatible with the Azure VPN gateway. For policy-based routing, the Azure VPN
gateway supports the AES256, AES128, and 3DES encryption algorithms. Route-
based gateways support AES256 and 3DES.

The following recommendations are useful for determining if there is a problem with the
Azure VPN gateway:

 Examine Azure VPN gateway diagnostic logs for potential issues.


 Verify that the Azure VPN gateway and on-premises VPN appliance are
configured with the same shared authentication key.

You can view the shared key stored by the Azure VPN gateway using the following
Azure CLI command:

azure network vpn-connection shared-key show <<resource-group>> <<vpn-


connection-name>>

Use the command appropriate for your on-premises VPN appliance to show the shared key
configured for that appliance.

Verify that the GatewaySubnet subnet holding the Azure VPN gateway is not associated with
an NSG.

You can view the subnet details using the following Azure CLI command:

azure network vnet subnet show -g <<resource-group>> -e <<vnet-name>> -n


GatewaySubnet

Ensure there is no data field named Network Security Group id. The following example
shows the results for an instance of the GatewaySubnet that has an assigned NSG (VPN-
Gateway-Group). This can prevent the gateway from working correctly if there are any rules
defined for this NSG.

 C:\>azure network vnet subnet show -g profx-prod-rg -e profx-vnet -n


GatewaySubnet
info: Executing command network vnet subnet show
+ Looking up virtual network "profx-vnet"
+ Looking up the subnet "GatewaySubnet"
data: Id : /subscriptions/########-
####-####-####-############/resourceGroups/profx-prod-rg/providers/
Microsoft.Network/virtualNetworks/profx-vnet/subnets/GatewaySubnet
data: Name : GatewaySubnet
data: Provisioning state : Succeeded
data: Address prefix : 10.20.3.0/27
data: Network Security Group id : /subscriptions/########-
####-####-####-############/resourceGroups/profx-prod-rg/providers/
Microsoft.Network/networkSecurityGroups/VPN-Gateway-Group
info: network vnet subnet show command OK

 Verify that the virtual machines in the Azure VNet are configured to permit traffic
coming in from outside the VNet.

Check any NSG rules associated with subnets containing these virtual machines. You can
view all NSG rules using the following Azure CLI command:

 azure network nsg show -g <<resource-group>> -n <<nsg-name>>

 Verify that the Azure VPN gateway is connected.

You can use the following Azure PowerShell command to check the current status of the
Azure VPN connection. The <<connection-name>> parameter is the name of the Azure
VPN connection that links the virtual network gateway and the local gateway.

Get-AzureRmVirtualNetworkGatewayConnection -Name <<connection-name>> -


ResourceGroupName <<resource-group>>

The following snippets highlight the output generated if the gateway is connected (the first
example), and disconnected (the second example):

PS C:\> Get-AzureRmVirtualNetworkGatewayConnection -Name profx-gateway-


connection -ResourceGroupName profx-prod-rg

AuthorizationKey :
VirtualNetworkGateway1 :
Microsoft.Azure.Commands.Network.Models.PSVirtualNetworkGateway
VirtualNetworkGateway2 :
LocalNetworkGateway2 :
Microsoft.Azure.Commands.Network.Models.PSLocalNetworkGateway
Peer :
ConnectionType : IPsec
RoutingWeight : 0
SharedKey : ####################################
ConnectionStatus : Connected
EgressBytesTransferred : 55254803
IngressBytesTransferred : 32227221
ProvisioningState : Succeeded
...
 PS C:\> Get-AzureRmVirtualNetworkGatewayConnection -Name profx-
gateway-connection2 -ResourceGroupName profx-prod-rg

 AuthorizationKey :
 VirtualNetworkGateway1 :
Microsoft.Azure.Commands.Network.Models.PSVirtualNetworkGateway
 VirtualNetworkGateway2 :
 LocalNetworkGateway2 :
Microsoft.Azure.Commands.Network.Models.PSLocalNetworkGateway
 Peer :
 ConnectionType : IPsec
 RoutingWeight : 0
 SharedKey : ####################################
 ConnectionStatus : NotConnected
 EgressBytesTransferred : 0
 IngressBytesTransferred : 0
 ProvisioningState : Succeeded
 ...

The following recommendations are useful for determining if there is an issue with Host VM
configuration, network bandwidth utilization, or application performance:

 Verify that the firewall in the guest operating system running on the Azure VMs
in the subnet is configured correctly to allow permitted traffic from the on-
premises IP ranges.
 Verify that the volume of traffic is not close to the limit of the bandwidth
available to the Azure VPN gateway.

How to verify this depends on the VPN appliance running on-premises. For example,
if you are using RRAS on Windows Server 2012, you can use Performance Monitor
to track the volume of data being received and transmitted over the VPN connection.
Using the RAS Total object, select the Bytes Received/Sec and Bytes Transmitted/Sec
counters:

You should compare the results with the bandwidth available to the VPN gateway
(100 Mbps for the Basic and Standard SKUs, and 200 Mbps for the High Performance
SKU):

 Verify that you have deployed the right number and size of VMs for your
application load.

Determine if any of the virtual machines in the Azure VNet are running slowly. If so,
they may be overloaded, there may be too few to handle the load, or the load-
balancers may not be configured correctly. To determine this, capture and analyze
diagnostic information. You can examine the results using the Azure portal, but many
third-party tools are also available that can provide detailed insights into the
performance data.

 Verify that the application is making efficient use of cloud resources.

Instrument application code running on each VM to determine whether applications


are making the best use of resources. You can use tools such as Application Insights.
Deploy the solution
Prequisites. You must have an existing on-premises infrastructure already configured with a
suitable network appliance.

To deploy the solution, perform the following steps.

1. Click the button below:

2. Wait for the link to open in the Azure portal, then follow these steps:
o The Resource group name is already defined in the parameter file, so select Create
New and enter ra-hybrid-vpn-rg in the text box.
o Select the region from the Location drop down box.
o Do not edit the Template Root Uri or the Parameter Root Uri text boxes.
o Review the terms and conditions, then click the I agree to the terms and conditions
stated above checkbox.
o Click the Purchase button.
3. Wait for the deployment to complete.

Connect an on-premises network to Azure


using ExpressRoute
This reference architecture shows how to connect an on-premises network to virtual networks
on Azure, using Azure ExpressRoute. ExpressRoute connections use a private, dedicated
connection through a third-party connectivity provider. The private connection extends your
on-premises network into Azure.

Architecture
The architecture consists of the following components.

 On-premises corporate network. A private local-area network running within an


organization.
 ExpressRoute circuit. A layer 2 or layer 3 circuit supplied by the connectivity
provider that joins the on-premises network with Azure through the edge routers. The
circuit uses the hardware infrastructure managed by the connectivity provider.
 Local edge routers. Routers that connect the on-premises network to the circuit
managed by the provider. Depending on how your connection is provisioned, you
may need to provide the public IP addresses used by the routers.
 Microsoft edge routers. Two routers in an active-active highly available
configuration. These routers enable a connectivity provider to connect their circuits
directly to their datacenter. Depending on how your connection is provisioned, you
may need to provide the public IP addresses used by the routers.
 Azure virtual networks (VNets). Each VNet resides in a single Azure region, and
can host multiple application tiers. Application tiers can be segmented using subnets
in each VNet.
 Azure public services. Azure services that can be used within a hybrid application.
These services are also available over the Internet, but accessing them using an
ExpressRoute circuit provides low latency and more predictable performance, because
traffic does not go through the Internet. Connections are performed using public
peering, with addresses that are either owned by your organization or supplied by
your connectivity provider.
 Office 365 services. The publicly available Office 365 applications and services
provided by Microsoft. Connections are performed using Microsoft peering, with
addresses that are either owned by your organization or supplied by your connectivity
provider. You can also connect directly to Microsoft CRM Online through Microsoft
peering.
 Connectivity providers (not shown). Companies that provide a connection either
using layer 2 or layer 3 connectivity between your datacenter and an Azure
datacenter.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

Connectivity providers

Select a suitable ExpressRoute connectivity provider for your location. To get a list of
connectivity providers available at your location, use the following Azure PowerShell
command:

PowerShell
Get-AzureRmExpressRouteServiceProvider

ExpressRoute connectivity providers connect your datacenter to Microsoft in the following


ways:

 Co-located at a cloud exchange. If you're co-located in a facility with a cloud exchange, you
can order virtual cross-connections to Azure through the co-location provider’s Ethernet
exchange. Co-location providers can offer either layer 2 cross-connections, or managed layer
3 cross-connections between your infrastructure in the co-location facility and Azure.
 Point-to-point Ethernet connections. You can connect your on-premises datacenters/offices
to Azure through point-to-point Ethernet links. Point-to-point Ethernet providers can offer
layer 2 connections, or managed layer 3 connections between your site and Azure.
 Any-to-any (IPVPN) networks. You can integrate your wide area network (WAN) with Azure.
Internet protocol virtual private network (IPVPN) providers (typically a multiprotocol label
switching VPN) offer any-to-any connectivity between your branch offices and datacenters.
Azure can be interconnected to your WAN to make it look just like any other branch office.
WAN providers typically offer managed layer 3 connectivity.

For more information about connectivity providers, see the ExpressRoute introduction.

ExpressRoute circuit

Ensure that your organization has met the ExpressRoute prerequisite requirements for
connecting to Azure.

If you haven't already done so, add a subnet named GatewaySubnet to your Azure VNet and
create an ExpressRoute virtual network gateway using the Azure VPN gateway service. For
more information about this process, see ExpressRoute workflows for circuit provisioning
and circuit states.

Create an ExpressRoute circuit as follows:

1. Run the following PowerShell command:

PowerShell

 New-AzureRmExpressRouteCircuit -Name <<circuit-name>> -ResourceGroupName


<<resource-group>> -Location <<location>> -SkuTier <<sku-tier>> -SkuFamily
<<sku-family>> -ServiceProviderName <<service-provider-name>> -
PeeringLocation <<peering-location>> -BandwidthInMbps <<bandwidth-in-mbps>>

 Send the ServiceKey for the new circuit to the service provider.

 Wait for the provider to provision the circuit. To verify the provisioning state of a circuit,
run the following PowerShell command:

PowerShell
 Get-AzureRmExpressRouteCircuit -Name <<circuit-name>> -ResourceGroupName
<<resource-group>>

The Provisioning state field in the Service Provider section of the output will change
from NotProvisioned to Provisioned when the circuit is ready.

Note

If you're using a layer 3 connection, the provider should configure and manage routing for
you. You provide the information necessary to enable the provider to implement the
appropriate routes.
 If you're using a layer 2 connection:

1. Reserve two /30 subnets composed of valid public IP addresses for each type of
peering you want to implement. These /30 subnets will be used to provide IP
addresses for the routers used for the circuit. If you are implementing private, public,
and Microsoft peering, you'll need 6 /30 subnets with valid public IP addresses.
2. Configure routing for the ExpressRoute circuit. Run the following PowerShell
commands for each type of peering you want to configure (private, public, and
Microsoft). For more information, see Create and modify routing for an ExpressRoute
circuit.

PowerShell

2. 
3. Set-AzureRmExpressRouteCircuitPeeringConfig -Name <<peering-name>> -
Circuit <<circuit-name>> -PeeringType <<peering-type>> -PeerASN
<<peer-asn>> -PrimaryPeerAddressPrefix <<primary-peer-address-
prefix>> -SecondaryPeerAddressPrefix <<secondary-peer-address-
prefix>> -VlanId <<vlan-id>>
4.
5. Set-AzureRmExpressRouteCircuit -ExpressRouteCircuit <<circuit-name>>

6. Reserve another pool of valid public IP addresses to use for network address
translation (NAT) for public and Microsoft peering. It is recommended to have a
different pool for each peering. Specify the pool to your connectivity provider, so they
can configure border gateway protocol (BGP) advertisements for those ranges.

 Run the following PowerShell commands to link your private VNet(s) to the
ExpressRoute circuit. For more information,see Link a virtual network to an ExpressRoute
circuit.

PowerShell
5. $circuit = Get-AzureRmExpressRouteCircuit -Name <<circuit-name>> -
ResourceGroupName <<resource-group>>
6. $gw = Get-AzureRmVirtualNetworkGateway -Name <<gateway-name>> -
ResourceGroupName <<resource-group>>
7. New-AzureRmVirtualNetworkGatewayConnection -Name <<connection-name>>
-ResourceGroupName <<resource-group>> -Location <<location> -
VirtualNetworkGateway1 $gw -PeerId $circuit.Id -ConnectionType
ExpressRoute
8.

You can connect multiple VNets located in different regions to the same ExpressRoute
circuit, as long as all VNets and the ExpressRoute circuit are located within the same
geopolitical region.

Troubleshooting

If a previously functioning ExpressRoute circuit now fails to connect, in the absence of any
configuration changes on-premises or within your private VNet, you may need to contact the
connectivity provider and work with them to correct the issue. Use the following Powershell
commands to verify that the ExpressRoute circuit has been provisioned:
PowerShell
Get-AzureRmExpressRouteCircuit -Name <<circuit-name>> -ResourceGroupName
<<resource-group>>

The output of this command shows several properties for your circuit, including
ProvisioningState, CircuitProvisioningState, and
ServiceProviderProvisioningState as shown below.

ProvisioningState : Succeeded
Sku : {
"Name": "Standard_MeteredData",
"Tier": "Standard",
"Family": "MeteredData"
}
CircuitProvisioningState : Enabled
ServiceProviderProvisioningState : NotProvisioned

If the ProvisioningState is not set to Succeeded after you tried to create a new circuit,
remove the circuit by using the command below and try to create it again.

PowerShell
Remove-AzureRmExpressRouteCircuit -Name <<circuit-name>> -ResourceGroupName
<<resource-group>>

If your provider had already provisioned the circuit, and the ProvisioningState is set to
Failed, or the CircuitProvisioningState is not Enabled, contact your provider for
further assistance.

Scalability considerations
ExpressRoute circuits provide a high bandwidth path between networks. Generally, the
higher the bandwidth the greater the cost.

ExpressRoute offers two pricing plans to customers, a metered plan and an unlimited data
plan. Charges vary according to circuit bandwidth. Available bandwidth will likely vary from
provider to provider. Use the Get-AzureRmExpressRouteServiceProvider cmdlet to see
the providers available in your region and the bandwidths that they offer.

A single ExpressRoute circuit can support a certain number of peerings and VNet links. See
ExpressRoute limits for more information.

For an extra charge, the ExpressRoute Premium add-on provides some additional capability:

 Increased route limits for public and private peering.


 Increased number of VNet links per ExpressRoute circuit.
 Global connectivity for services.

See ExpressRoute pricing for details.


ExpressRoute circuits are designed to allow temporary network bursts up to two times the
bandwidth limit that you procured for no additional cost. This is achieved by using redundant
links. However, not all connectivity providers support this feature. Verify that your
connectivity provider enables this feature before depending on it.

Although some providers allow you to change your bandwidth, make sure you pick an initial
bandwidth that surpasses your needs and provides room for growth. If you need to increase
bandwidth in the future, you are left with two options:

 Increase the bandwidth. You should avoid this option as much as possible, and not all
providers allow you to increase bandwidth dynamically. But if a bandwidth increase
is needed, check with your provider to verify they support changing ExpressRoute
bandwidth properties via Powershell commands. If they do, run the commands below.

PowerShell

 $ckt = Get-AzureRmExpressRouteCircuit -Name <<circuit-name>> -


ResourceGroupName <<resource-group>>
$ckt.ServiceProviderProperties.BandwidthInMbps = <<bandwidth-in-mbps>>
Set-AzureRmExpressRouteCircuit -ExpressRouteCircuit $ckt

You can increase the bandwidth without loss of connectivity. Downgrading the bandwidth
will result in disruption in connectivity, because you must delete the circuit and recreate it
with the new configuration.

 Change your pricing plan and/or upgrade to Premium. To do so, run the following
commands. The Sku.Tier property can be Standard or Premium; the Sku.Name property can
be MeteredData or UnlimitedData.

PowerShell
 $ckt = Get-AzureRmExpressRouteCircuit -Name <<circuit-name>> -
ResourceGroupName <<resource-group>>

 $ckt.Sku.Tier = "Premium"
 $ckt.Sku.Family = "MeteredData"
 $ckt.Sku.Name = "Premium_MeteredData"

 Set-AzureRmExpressRouteCircuit -ExpressRouteCircuit $ckt

 Important

Make sure the Sku.Name property matches the Sku.Tier and Sku.Family. If you
change the family and tier, but not the name, your connection will be disabled.

You can upgrade the SKU without disruption, but you cannot switch from the
unlimited pricing plan to metered. When downgrading the SKU, your bandwidth
consumption must remain within the default limit of the standard SKU.

Availability considerations
ExpressRoute does not support router redundancy protocols such as hot standby routing
protocol (HSRP) and virtual router redundancy protocol (VRRP) to implement high
availability. Instead, it uses a redundant pair of BGP sessions per peering. To facilitate
highly-available connections to your network, Azure provisions you with two redundant ports
on two routers (part of the Microsoft edge) in an active-active configuration.

By default, BGP sessions use an idle timeout value of 60 seconds. If a session times out three
times (180 seconds total), the router is marked as unavailable, and all traffic is redirected to
the remaining router. This 180-second timeout might be too long for critical applications. If
so, you can change your BGP time-out settings on the on-premises router to a smaller value.

You can configure high availability for your Azure connection in different ways, depending
on the type of provider you use, and the number of ExpressRoute circuits and virtual network
gateway connections you're willing to configure. The following summarizes your availability
options:

 If you're using a layer 2 connection, deploy redundant routers in your on-premises


network in an active-active configuration. Connect the primary circuit to one router,
and the secondary circuit to the other. This will give you a highly available
connection at both ends of the connection. This is necessary if you require the
ExpressRoute service level agreement (SLA). See SLA for Azure ExpressRoute for
details.

The following diagram shows a configuration with redundant on-premises routers


connected to the primary and secondary circuits. Each circuit handles the traffic for a
public peering and a private peering (each peering is designated a pair of /30 address
spaces, as described in the previous section).

 If you're using a layer 3 connection, verify that it provides redundant BGP sessions
that handle availability for you.
 Connect the VNet to multiple ExpressRoute circuits, supplied by different service
providers. This strategy provides additional high-availability and disaster recovery
capabilities.
 Configure a site-to-site VPN as a failover path for ExpressRoute. For more about this
option, see Connect an on-premises network to Azure using ExpressRoute with VPN
failover. This option only applies to private peering. For Azure and Office 365
services, the Internet is the only failover path.

Manageability considerations
You can use the Azure Connectivity Toolkit (AzureCT) to monitor connectivity between
your on-premises datacenter and Azure.

Security considerations
You can configure security options for your Azure connection in different ways, depending
on your security concerns and compliance needs.

ExpressRoute operates in layer 3. Threats in the application layer can be prevented by using a
network security appliance that restricts traffic to legitimate resources. Additionally,
ExpressRoute connections using public peering can only be initiated from on-premises. This
prevents a rogue service from accessing and compromising on-premises data from the
Internet.

To maximize security, add network security appliances between the on-premises network and
the provider edge routers. This will help to restrict the inflow of unauthorized traffic from the
VNet:

For auditing or compliance purposes, it may be necessary to prohibit direct access from
components running in the VNet to the Internet and implement forced tunneling. In this
situation, Internet traffic should be redirected back through a proxy running on-premises
where it can be audited. The proxy can be configured to block unauthorized traffic flowing
out, and filter potentially malicious inbound traffic.

To maximize security, do not enable a public IP address for your VMs, and use NSGs to
ensure that these VMs aren't publicly accessible. VMs should only be available using the
internal IP address. These addresses can be made accessible through the ExpressRoute
network, enabling on-premises DevOps staff to perform configuration or maintenance.

If you must expose management endpoints for VMs to an external network, use NSGs or
access control lists to restrict the visibility of these ports to a whitelist of IP addresses or
networks.

Note

By default, Azure VMs deployed through the Azure portal include a public IP address that
provides login access.

Deploy the solution


Prequisites. You must have an existing on-premises infrastructure already configured with a
suitable network appliance.

To deploy the solution, perform the following steps.

1. Click the button below:

2. Wait for the link to open in the Azure portal, then follow these steps:
o The Resource group name is already defined in the parameter file, so select Create
New and enter ra-hybrid-er-rg in the text box.
o Select the region from the Location drop down box.
o Do not edit the Template Root Uri or the Parameter Root Uri text boxes.
o Review the terms and conditions, then click the I agree to the terms and conditions
stated above checkbox.
o Click the Purchase button.
3. Wait for the deployment to complete.
4. Click the button below:

5. Wait for the link to open in the Azure portal, then follow these steps:
o Select Use existing in the Resource group section and enter ra-hybrid-er-rg in
the text box.
o Select the region from the Location drop down box.
o Do not edit the Template Root Uri or the Parameter Root Uri text boxes.
o Review the terms and conditions, then click the I agree to the terms and conditions
stated above checkbox.
o Click the Purchase button.
6. Wait for the deployment to complete.

Connect an on-premises network to Azure


using ExpressRoute with VPN failover
This reference architecture shows how to connect an on-premises network to an Azure virtual
network (VNet) using ExpressRoute, with a site-to-site virtual private network (VPN) as a
failover connection. Traffic flows between the on-premises network and the Azure VNet
through an ExpressRoute connection. If there is a loss of connectivity in the ExpressRoute
circuit, traffic is routed through an IPSec VPN tunnel.

Note that if the ExpressRoute circuit is unavailable, the VPN route will only handle private
peering connections. Public peering and Microsoft peering connections will pass over the
Internet.
Architecture
The architecture consists of the following components.

 On-premises network. A private local-area network running within an organization.


 VPN appliance. A device or service that provides external connectivity to the on-
premises network. The VPN appliance may be a hardware device, or it can be a
software solution such as the Routing and Remote Access Service (RRAS) in
Windows Server 2012. For a list of supported VPN appliances and information on
configuring selected VPN appliances for connecting to Azure, see About VPN
devices for Site-to-Site VPN Gateway connections.
 ExpressRoute circuit. A layer 2 or layer 3 circuit supplied by the connectivity
provider that joins the on-premises network with Azure through the edge routers. The
circuit uses the hardware infrastructure managed by the connectivity provider.
 ExpressRoute virtual network gateway. The ExpressRoute virtual network gateway
enables the VNet to connect to the ExpressRoute circuit used for connectivity with
your on-premises network.
 VPN virtual network gateway. The VPN virtual network gateway enables the VNet
to connect to the VPN appliance in the on-premises network. The VPN virtual
network gateway is configured to accept requests from the on-premises network only
through the VPN appliance. For more information, see Connect an on-premises
network to a Microsoft Azure virtual network.
 VPN connection. The connection has properties that specify the connection type
(IPSec) and the key shared with the on-premises VPN appliance to encrypt traffic.
 Azure Virtual Network (VNet). Each VNet resides in a single Azure region, and can
host multiple application tiers. Application tiers can be segmented using subnets in
each VNet.
 Gateway subnet. The virtual network gateways are held in the same subnet.
 Cloud application. The application hosted in Azure. It might include multiple tiers,
with multiple subnets connected through Azure load balancers. For more information
about the application infrastructure, see Running Windows VM workloads and
Running Linux VM workloads.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

VNet and GatewaySubnet

Create the ExpressRoute virtual network gateway and the VPN virtual network gateway in
the same VNet. This means that they should share the same subnet named GatewaySubnet.

If the VNet already includes a subnet named GatewaySubnet, ensure that it has a /27 or larger
address space. If the existing subnet is too small, use the following PowerShell command to
remove the subnet:

PowerShell
$vnet = Get-AzureRmVirtualNetworkGateway -Name <yourvnetname> -
ResourceGroupName <yourresourcegroup>
Remove-AzureRmVirtualNetworkSubnetConfig -Name GatewaySubnet -
VirtualNetwork $vnet

If the VNet does not contain a subnet named GatewaySubnet, create a new one using the
following Powershell command:

PowerShell
$vnet = Get-AzureRmVirtualNetworkGateway -Name <yourvnetname> -
ResourceGroupName <yourresourcegroup>
Add-AzureRmVirtualNetworkSubnetConfig -Name "GatewaySubnet" -VirtualNetwork
$vnet -AddressPrefix "10.200.255.224/27"
$vnet = Set-AzureRmVirtualNetwork -VirtualNetwork $vnet
VPN and ExpressRoute gateways

Verify that your organization meets the ExpressRoute prerequisite requirements for
connecting to Azure.

If you already have a VPN virtual network gateway in your Azure VNet, use the following
Powershell command to remove it:

PowerShell
Remove-AzureRmVirtualNetworkGateway -Name <yourgatewayname> -
ResourceGroupName <yourresourcegroup>

Follow the instructions in Implementing a hybrid network architecture with Azure


ExpressRoute to establish your ExpressRoute connection.

Follow the instructions in Implementing a hybrid network architecture with Azure and On-
premises VPN to establish your VPN virtual network gateway connection.

After you have established the virtual network gateway connections, test the environment as
follows:

1. Make sure you can connect from your on-premises network to your Azure VNet.
2. Contact your provider to stop ExpressRoute connectivity for testing.
3. Verify that you can still connect from your on-premises network to your Azure VNet using
the VPN virtual network gateway connection.
4. Contact your provider to reestablish ExpressRoute connectivity.

Considerations
For ExpressRoute considerations, see the Implementing a Hybrid Network Architecture with
Azure ExpressRoute guidance.

For site-to-site VPN considerations, see the Implementing a Hybrid Network Architecture
with Azure and On-premises VPN guidance.

For general Azure security considerations, see Microsoft cloud services and network security.
Deploy the solution
Prequisites. You must have an existing on-premises infrastructure already configured with a
suitable network appliance.

To deploy the solution, perform the following steps.

1. Click the button below:

2. Wait for the link to open in the Azure portal, then follow these steps:
o The Resource group name is already defined in the parameter file, so select Create
New and enter ra-hybrid-vpn-er-rg in the text box.
o Select the region from the Location drop down box.
o Do not edit the Template Root Uri or the Parameter Root Uri text boxes.
o Review the terms and conditions, then click the I agree to the terms and conditions
stated above checkbox.
o Click the Purchase button.
3. Wait for the deployment to complete.
4. Click the button below:

5. Wait for the link to open in the Azure portal, then enter then follow these steps:
o Select Use existing in the Resource group section and enter ra-hybrid-vpn-er-
rg in the text box.
o Select the region from the Location drop down box.
o Do not edit the Template Root Uri or the Parameter Root Uri text boxes.
o Review the terms and conditions, then click the I agree to the terms and conditions
stated above checkbox.
o Click the Purchase button.

Choose a solution for integrating on-


premises Active Directory with Azure
This article compares options for integrating your on-premises Active Directory (AD)
environment with an Azure network. For each option, a more detailed reference architecture
is available.

Many organizations use Active Directory Domain Services (AD DS) to authenticate identities
associated with users, computers, applications, or other resources that are included in a
security boundary. Directory and identity services are typically hosted on-premises, but if
your application is hosted partly on-premises and partly in Azure, there may be latency
sending authentication requests from Azure back to on-premises. Implementing directory and
identity services in Azure can reduce this latency.

Azure provides two solutions for implementing directory and identity services in Azure:
 Use Azure AD to create an Active Directory domain in the cloud and connect it to
your on-premises Active Directory domain. Azure AD Connect integrates your on-
premises directories with Azure AD.
 Extend your existing on-premises Active Directory infrastructure to Azure, by
deploying a VM in Azure that runs AD DS as a domain controller. This architecture is
more common when the on-premises network and the Azure virtual network (VNet)
are connected by a VPN or ExpressRoute connection. Several variations of this
architecture are possible:
o Create a domain in Azure and join it to your on-premises AD forest.
o Create a separate forest in Azure that is trusted by domains in your on-
premises forest.
o Replicate an Active Directory Federation Services (AD FS) deployment to
Azure.

The next sections describe each of these options in more detail.

Integrate your on-premises domains with Azure AD


Use Azure Active Directory (Azure AD) to create a domain in Azure and link it to an on-
premises AD domain.

The Azure AD directory is not an extension of an on-premises directory. Rather, it's a copy
that contains the same objects and identities. Changes made to these items on-premises are
copied to Azure AD, but changes made in Azure AD are not replicated back to the on-
premises domain.

You can also use Azure AD without using an on-premises directory. In this case, Azure AD
acts as the primary source of all identity information, rather than containing data replicated
from an on-premises directory.

Benefits

 You don't need to maintain an AD infrastructure in the cloud. Azure AD is entirely


managed and maintained by Microsoft.
 Azure AD provides the same identity information that is available on-premises.
 Authentication can happen in Azure, reducing the need for external applications and
users to contact the on-premises domain.

Challenges

 Identity services are limited to users and groups. There is no ability to authenticate
service and computer accounts.
 You must configure connectivity with your on-premises domain to keep the Azure
AD directory synchronized.
 Applications may need to be rewritten to enable authentication through Azure AD.

Reference architecture

 Integrate on-premises Active Directory domains with Azure Active Directory


AD DS in Azure joined to an on-premises forest
Deploy AD Domain Services (AD DS) servers to Azure. Create a domain in Azure and join it
to your on-premises AD forest.

Consider this option if you need to use AD DS features that are not currently implemented by
Azure AD.

Benefits

 Provides access to the same identity information that is available on-premises.


 You can authenticate user, service, and computer accounts on-premises and in Azure.
 You don't need to manage a separate AD forest. The domain in Azure can belong to
the on-premises forest.
 You can apply group policy defined by on-premises Group Policy Objects to the
domain in Azure.

Challenges

 You must deploy and manage your own AD DS servers and domain in the cloud.
 There may be some synchronization latency between the domain servers in the cloud
and the servers running on-premises.

Reference architecture

 Extend Active Directory Domain Services (AD DS) to Azure

AD DS in Azure with a separate forest


Deploy AD Domain Services (AD DS) servers to Azure, but create a separate Active
Directory forest that is separate from the on-premises forest. This forest is trusted by domains
in your on-premises forest.

Typical uses for this architecture include maintaining security separation for objects and
identities held in the cloud, and migrating individual domains from on-premises to the cloud.

Benefits

 You can implement on-premises identities and separate Azure-only identities.


 You don't need to replicate from the on-premises AD forest to Azure.

Challenges

 Authentication within Azure for on-premises identities requires extra network hops to
the on-premises AD servers.
 You must deploy your own AD DS servers and forest in the cloud, and establish the
appropriate trust relationships between forests.

Reference architecture
 Create an Active Directory Domain Services (AD DS) resource forest in Azure

Extend AD FS to Azure
Replicate an Active Directory Federation Services (AD FS) deployment to Azure, to perform
federated authentication and authorization for components running in Azure.

Typical uses for this architecture:

 Authenticate and authorize users from partner organizations.


 Allow users to authenticate from web browsers running outside of the organizational
firewall.
 Allow users to connect from authorized external devices such as mobile devices.

Benefits

 You can leverage claims-aware applications.


 Provides the ability to trust external partners for authentication.
 Compatibility with large set of authentication protocols.

Challenges

 You must deploy your own AD DS, AD FS, and AD FS Web Application Proxy
servers in Azure.
 This architecture can be complex to configure.

Integrate on-premises Active Directory


domains with Azure Active Directory
Azure Active Directory (Azure AD) is a cloud based multi-tenant directory and identity
service. This reference architecture shows best practices for integrating on-premises Active
Directory domains with Azure AD to provide cloud-based identity authentication.
Download a Visio file of this architecture.

Note

For simplicity, this diagram only shows the connections directly related to Azure AD, and not
protocol-related traffic that may occur as part of authentication and identity federation. For
example, a web application may redirect the web browser to authenticate the request through
Azure AD. Once authenticated, the request can be passed back to the web application, with
the appropriate identity information.

Typical uses for this reference architecture include:

 Web applications deployed in Azure that provide access to remote users who belong to your
organization.
 Implementing self-service capabilities for end-users, such as resetting their passwords, and
delegating group management. Note that this requires Azure AD Premium edition.
 Architectures in which the on-premises network and the application's Azure VNet are not
connected using a VPN tunnel or ExpressRoute circuit.

Note

Azure AD can authenticate the identity of users and applications that exist in an
organization’s directory. Some applications and services, such as SQL Server, may require
computer authentication, in which case this solution is not appropriate.

For additional considerations, see Choose a solution for integrating on-premises Active
Directory with Azure.

Architecture
The architecture has the following components.

 Azure AD tenant. An instance of Azure AD created by your organization. It acts as a


directory service for cloud applications by storing objects copied from the on-
premises Active Directory and provides identity services.
 Web tier subnet. This subnet holds VMs that run a web application. Azure AD can
act as an identity broker for this application.
 On-premises AD DS server. An on-premise directory and identity service. The AD
DS directory can be synchronized with Azure AD to enable it to authenticate on-
premise users.
 Azure AD Connect sync server. An on-premises computer that runs the Azure AD
Connect sync service. This service synchronizes information held in the on-premises
Active Directory to Azure AD. For example, if you provision or deprovision groups
and users on-premises, these changes propagate to Azure AD.

Note

For security reasons, Azure AD stores user's passwords as a hash. If a user requires a
password reset, this must be performed on-premises and the new hash must be sent to
Azure AD. Azure AD Premium editions include features that can automate this task to
enable users to reset their own passwords.

 VMs for N-tier application. The deployment includes infrastructure for an N-tier
application. For more information about these resources, see Run VMs for an N-tier
architecture.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

Azure AD Connect sync service

The Azure AD Connect sync service ensures that identity information stored in the cloud is
consistent with that held on-premises. You install this service using the Azure AD Connect
software.

Before implementing Azure AD Connect sync, determine the synchronization requirements


of your organization. For example, what to synchronize, from which domains, and how
frequently. For more information, see Determine directory synchronization requirements.

You can run the Azure AD Connect sync service on a VM or a computer hosted on-premises.
Depending on the volatility of the information in your Active Directory directory, the load on
the Azure AD Connect sync service is unlikely to be high after the initial synchronization
with Azure AD. Running the service on a VM makes it easier to scale the server if needed.
Monitor the activity on the VM as described in the Monitoring considerations section to
determine whether scaling is necessary.

If you have multiple on-premises domains in a forest, we recommend storing and


synchronizing information for the entire forest to a single Azure AD tenant. Filter
information for identities that occur in more than one domain, so that each identity appears
only once in Azure AD, rather than being duplicated. Duplication can lead to inconsistencies
when data is synchronized. For more information, see the Topology section below.

Use filtering so that only necessary data is stored in Azure AD. For example, your
organization might not want to store information about inactive accounts in Azure AD.
Filtering can be group-based, domain-based, organization unit (OU)-based, or attribute-
based. You can combine filters to generate more complex rules. For example, you could
synchronize objects held in a domain that have a specific value in a selected attribute. For
detailed information, see Azure AD Connect sync: Configure Filtering.

To implement high availability for the AD Connect sync service, run a secondary staging
server. For more information, see the Topology recommendations section.

Security recommendations

User password management. The Azure AD Premium editions support password writeback,
enabling your on-premises users to perform self-service password resets from within the
Azure portal. This feature should only be enabled after reviewing your organization's
password security policy. For example, you can restrict which users can change their
passwords, and you can tailor the password management experience. For more information,
see Customizing Password Management to fit your organization's needs.

Protect on-premises applications that can be accessed externally. Use the Azure AD
Application Proxy to provide controlled access to on-premises web applications for external
users through Azure AD. Only users that have valid credentials in your Azure directory have
permission to use the application. For more information, see the article Enable Application
Proxy in the Azure portal.

Actively monitor Azure AD for signs of suspicious activity. Consider using Azure AD
Premium P2 edition, which includes Azure AD Identity Protection. Identity Protection uses
adaptive machine learning algorithms and heuristics to detect anomalies and risk events that
may indicate that an identity has been compromised. For example, it can detect potentially
unusual activity such as irregular sign-in activities, sign-ins from unknown sources or from IP
addresses with suspicious activity, or sign-ins from devices that may be infected. Using this
data, Identity Protection generates reports and alerts that enables you to investigate these risk
events and take appropriate action. For more information, see Azure Active Directory
Identity Protection.

You can use the reporting feature of Azure AD in the Azure portal to monitor security-related
activities occurring in your system. For more information about using these reports, see
Azure Active Directory Reporting Guide.

Topology recommendations

Configure Azure AD Connect to implement a topology that most closely matches the
requirements of your organization. Topologies that Azure AD Connect supports include the
following:

 Single forest, single Azure AD directory. In this topology, Azure AD Connect


synchronizes objects and identity information from one or more domains in a single
on-premises forest into a single Azure AD tenant. This is the default topology
implemented by the express installation of Azure AD Connect.

Note

Don't use multiple Azure AD Connect sync servers to connect different domains in
the same on-premises forest to the same Azure AD tenant, unless you are running a
server in staging mode, described below.

 Multiple forests, single Azure AD directory. In this topology, Azure AD Connect


synchronizes objects and identity information from multiple forests into a single
Azure AD tenant. Use this topology if your organization has more than one on-
premises forest. You can consolidate identity information so that each unique user is
represented once in the Azure AD directory, even if the same user exists in more than
one forest. All forests use the same Azure AD Connect sync server. The Azure AD
Connect sync server does not have to be part of any domain, but it must be reachable
from all forests.
Note

In this topology, don't use separate Azure AD Connect sync servers to connect each
on-premises forest to a single Azure AD tenant. This can result in duplicated identity
information in Azure AD if users are present in more than one forest.

 Multiple forests, separate topologies. This topology merges identity information


from separate forests into a single Azure AD tenant, treating all forests as separate
entities. This topology is useful if you are combining forests from different
organizations and the identity information for each user is held in only one forest.

Note

If the global address lists (GAL) in each forest are synchronized, a user in one forest
may be present in another as a contact. This can occur if your organization has
implemented GALSync with Forefront Identity manager 2010 or Microsoft Identity
Manager 2016. In this scenario, you can specify that users should be identified by
their Mail attribute. You can also match identities using the ObjectSID and
msExchMasterAccountSID attributes. This is useful if you have one or more resource
forests with disabled accounts.

 Staging server. In this configuration, you run a second instance of the Azure AD
Connect sync server in parallel with the first. This structure supports scenarios such
as:
o High availability.
o Testing and deploying a new configuration of the Azure AD Connect sync
server.
o Introducing a new server and decommissioning an old configuration.

In these scenarios, the second instance runs in staging mode. The server
records imported objects and synchronization data in its database, but does not
pass the data to Azure AD. If you disable staging mode, the server starts
writing data to Azure AD, and also starts performing password write-backs
into the on-premises directories where appropriate. For more information, see
Azure AD Connect sync: Operational tasks and considerations.

 Multiple Azure AD directories. It is recommended that you create a single Azure


AD directory for an organization, but there may be situations where you need to
partition information across separate Azure AD directories. In this case, avoid
synchronization and password write-back issues by ensuring that each object from the
on-premises forest appears in only one Azure AD directory. To implement this
scenario, configure separate Azure AD Connect sync servers for each Azure AD
directory, and use filtering so that each Azure AD Connect sync server operates on a
mutually exclusive set of objects.

For more information about these topologies, see Topologies for Azure AD Connect.
User authentication

By default, the Azure AD Connect sync server configures password hash synchronization
between the on-premises domain and Azure AD, and the Azure AD service assumes that
users authenticate by providing the same password that they use on-premises. For many
organizations, this is appropriate, but you should consider your organization's existing
policies and infrastructure. For example:

 The security policy of your organization may prohibit synchronizing password hashes to the
cloud. In this case your organization should consider pass-through authentication.
 You might require that users experience seamless single sign-on (SSO) when accessing cloud
resources from domain-joined machines on the corporate network.
 Your organization might already have Active Directory Federation Services (AD FS) or a third
party federation provider deployed. You can configure Azure AD to use this infrastructure to
implement authentication and SSO rather than by using password information held in the
cloud.

For more information, see Azure AD Connect User Sign on options.

Azure AD application proxy

Use Azure AD to provide access to on-premises applications.

Expose your on-premises web applications using application proxy connectors managed by
the Azure AD application proxy component. The application proxy connector opens an
outbound network connection to the Azure AD application proxy, and remote users' requests
are routed back from Azure AD through this connection to the web apps. This removes the
need to open inbound ports in the on-premises firewall and reduces the attack surface
exposed by your organization.

For more information, see Publish applications using Azure AD Application proxy.

Object synchronization

Azure AD Connect's default configuration synchronizes objects from your local Active
Directory directory based on the rules specified in the article Azure AD Connect sync:
Understanding the default configuration. Objects that satisfy these rules are synchronized
while all other objects are ignored. Some example rules:

 User objects must have a unique sourceAnchor attribute and the accountEnabled attribute
must be populated.
 User objects must have a sAMAccountName attribute and cannot start with the text Azure
AD_ or MSOL_.

Azure AD Connect applies several rules to User, Contact, Group, ForeignSecurityPrincipal,


and Computer objects. Use the Synchronization Rules Editor installed with Azure AD
Connect if you need to modify the default set of rules. For more information, see Azure AD
Connect sync: Understanding the default configuration).
You can also define your own filters to limit the objects to be synchronized by domain or
OU. Alternatively, you can implement more complex custom filtering such as that described
in Azure AD Connect sync: Configure Filtering.

Monitoring

Health monitoring is performed by the following agents installed on-premises:

 Azure AD Connect installs an agent that captures information about synchronization


operations. Use the Azure AD Connect Health blade in the Azure portal to monitor its health
and performance. For more information, see Using Azure AD Connect Health for sync.
 To monitor the health of the AD DS domains and directories from Azure, install the Azure AD
Connect Health for AD DS agent on a machine within the on-premises domain. Use the Azure
Active Directory Connect Health blade in the Azure portal for health monitoring. For more
information, see Using Azure AD Connect Health with AD DS
 Install the Azure AD Connect Health for AD FS agent to monitor the health of services
running on on-premises, and use the Azure Active Directory Connect Health blade in the
Azure portal to monitor AD FS. For more information, see Using Azure AD Connect Health
with AD FS

For more information on installing the AD Connect Health agents and their requirements, see
Azure AD Connect Health Agent Installation.

Scalability considerations
The Azure AD service supports scalability based on replicas, with a single primary replica
that handles write operations plus multiple read-only secondary replicas. Azure AD
transparently redirects attempted writes made against secondary replicas to the primary
replica and provides eventual consistency. All changes made to the primary replica are
propagated to the secondary replicas. This architecture scales well because most operations
against Azure AD are reads rather than writes. For more information, see Azure AD: Under
the hood of our geo-redundant, highly available, distributed cloud directory.

For the Azure AD Connect sync server, determine how many objects you are likely to
synchronize from your local directory. If you have less than 100,000 objects, you can use the
default SQL Server Express LocalDB software provided with Azure AD Connect. If you
have a larger number of objects, you should install a production version of SQL Server and
perform a custom installation of Azure AD Connect, specifying that it should use an existing
instance of SQL Server.

Availability considerations
The Azure AD service is geo-distributed and runs in multiple data centers spread around the
world with automated failover. If a data center becomes unavailable, Azure AD ensures that
your directory data is available for instance access in at least two more regionally dispersed
data centers.

Note
The service level agreement (SLA) for Azure AD Basic and Premium services guarantees at
least 99.9% availability. There is no SLA for the Free tier of Azure AD. For more
information, see SLA for Azure Active Directory.

Consider provisioning a second instance of Azure AD Connect sync server in staging mode
to increase availability, as discussed in the topology recommendations section.

If you are not using the SQL Server Express LocalDB instance that comes with Azure AD
Connect, consider using SQL clustering to achieve high availability. Solutions such as
mirroring and Always On are not supported by Azure AD Connect.

For additional considerations about achieving high availability of the Azure AD Connect
sync server and also how to recover after a failure, see Azure AD Connect sync: Operational
tasks and considerations - Disaster Recovery.

Manageability considerations
There are two aspects to managing Azure AD:

 Administering Azure AD in the cloud.


 Maintaining the Azure AD Connect sync servers.

Azure AD provides the following options for managing domains and directories in the cloud:

 Azure Active Directory PowerShell Module. Use this module if you need to script common
Azure AD administrative tasks such as user management, domain management, and
configuring single sign-on.
 Azure AD management blade in the Azure portal. This blade provides an interactive
management view of the directory, and enables you to control and configure most aspects
of Azure AD.

Azure AD Connect installs the following tools to maintain Azure AD Connect sync services
from your on-premises machines:

 Microsoft Azure Active Directory Connect console. This tool enables you to modify the
configuration of the Azure AD Sync server, customize how synchronization occurs, enable or
disable staging mode, and switch the user sign-in mode. Note that you can enable Active
Directory FS sign-in using your on-premises infrastructure.
 Synchronization Service Manager. Use the Operations tab in this tool to manage the
synchronization process and detect whether any parts of the process have failed. You can
trigger synchronizations manually using this tool. The Connectors tab enables you to control
the connections for the domains that the synchronization engine is attached to.
 Synchronization Rules Editor. Use this tool to customize the way objects are transformed
when they are copied between an on-premises directory and Azure AD. This tool enables
you to specify additional attributes and objects for synchronization, then executes filters to
determine which objects should or should not be synchronized. For more information, see
the Synchronization Rule Editor section in the document Azure AD Connect sync:
Understanding the default configuration.
For more information and tips for managing Azure AD Connect, see Azure AD Connect
sync: Best practices for changing the default configuration.

Security considerations
Use conditional access control to deny authentication requests from unexpected sources:

 Trigger Azure Multi-Factor Authentication (MFA) if a user attempts to connect from


a nontrusted location such as across the Internet instead of a trusted network.
 Use the device platform type of the user (iOS, Android, Windows Mobile, Windows)
to determine access policy to applications and features.
 Record the enabled/disabled state of users' devices, and incorporate this information
into the access policy checks. For example, if a user's phone is lost or stolen it should
be recorded as disabled to prevent it from being used to gain access.
 Control user access to resources based on group membership. Use Azure AD dynamic
membership rules to simplify group administration. For a brief overview of how this
works, see Introduction to Dynamic Memberships for Groups.
 Use conditional access risk policies with Azure AD Identity Protection to provide
advanced protection based on unusual sign-in activities or other events.

For more information, see Azure Active Directory conditional access.

Deploy the solution


A deployment for a reference architecture that implements these recommendations and
considerations is available on GitHub. This reference architecture deploys a simulated on-
premises network in Azure that you can use to test and experiment. The reference
architecture can be deployed with either with Windows or Linux VMs by following the
directions below:

1. Click the button below:

2. Once the link has opened in the Azure portal, you must enter values for some of the settings:
o The Resource group name is already defined in the parameter file, so select Create
New and enter ra-aad-onpremise-rg in the text box.
o Select the region from the Location drop down box.
o Do not edit the Template Root Uri or the Parameter Root Uri text boxes.
o Select windows or linux in the Os Type the drop down box.
o Review the terms and conditions, then click the I agree to the terms and conditions
stated above checkbox.
o Click the Purchase button.
3. Wait for the deployment to complete.
4. The parameter files include a hard-coded administrator user names and passwords, and it is
strongly recommended that you immediately change both on all the VMs. Click each VM in
the Azure Portal then click on Reset password in the Support + troubleshooting blade.
Select Reset password in the Mode drop down box, then select a new User name and
Password. Click the Update button to persist the new user name and password.
Extend Active Directory Domain Services
(AD DS) to Azure
This reference architecture shows how to extend your Active Directory environment to Azure
to provide distributed authentication services using Active Directory Domain Services (AD
DS).

AD DS is used to authenticate user, computer, application, or other identities that are


included in a security domain. It can be hosted on-premises, but if your application is hosted
partly on-premises and partly in Azure, it may be more efficient to replicate this functionality
in Azure. This can reduce the latency caused by sending authentication and local
authorization requests from the cloud back to AD DS running on-premises.

This architecture is commonly used when the on-premises network and the Azure virtual
network are connected by a VPN or ExpressRoute connection. This architecture also supports
bidirectional replication, meaning changes can be made either on-premises or in the cloud,
and both sources will be kept consistent. Typical uses for this architecture include hybrid
applications in which functionality is distributed between on-premises and Azure, and
applications and services that perform authentication using Active Directory.

Architecture
This architecture extends the architecture shown in DMZ between Azure and the Internet. It
has the following components.

 On-premises network. The on-premises network includes local Active Directory servers that
can perform authentication and authorization for components located on-premises.
 Active Directory servers. These are domain controllers implementing directory services (AD
DS) running as VMs in the cloud. These servers can provide authentication of components
running in your Azure virtual network.
 Active Directory subnet. The AD DS servers are hosted in a separate subnet. Network
security group (NSG) rules protect the AD DS servers and provide a firewall against traffic
from unexpected sources.
 Azure Gateway and Active Directory synchronization. The Azure gateway provides a
connection between the on-premises network and the Azure VNet. This can be a VPN
connection or Azure ExpressRoute. All synchronization requests between the Active
Directory servers in the cloud and on-premises pass through the gateway. User-defined
routes (UDRs) handle routing for on-premises traffic that passes to Azure. Traffic to and from
the Active Directory servers does not pass through the network virtual appliances (NVAs)
used in this scenario.

For more information about configuring UDRs and the NVAs, see Implementing a secure
hybrid network architecture in Azure.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

VM recommendations

Determine your VM size requirements based on the expected volume of authentication


requests. Use the specifications of the machines hosting AD DS on premises as a starting
point, and match them with the Azure VM sizes. Once deployed, monitor utilization and
scale up or down based on the actual load on the VMs. For more information about sizing AD
DS domain controllers, see Capacity Planning for Active Directory Domain Services.

Create a separate virtual data disk for storing the database, logs, and SYSVOL for Active
Directory. Do not store these items on the same disk as the operating system. Note that by
default, data disks that are attached to a VM use write-through caching. However, this form
of caching can conflict with the requirements of AD DS. For this reason, set the Host Cache
Preference setting on the data disk to None. For more information, see Placement of the
Windows Server AD DS database and SYSVOL.

Deploy at least two VMs running AD DS as domain controllers and add them to an
availability set.

Networking recommendations

Configure the VM network interface (NIC) for each AD DS server with a static private IP
address for full domain name service (DNS) support. For more information, see How to set a
static private IP address in the Azure portal.

Note

Do not configure the VM NIC for any AD DS with a public IP address. See Security
considerations for more details.
The Active Directory subnet NSG requires rules to permit incoming traffic from on-premises.
For detailed information on the ports used by AD DS, see Active Directory and Active
Directory Domain Services Port Requirements. Also, ensure the UDR tables do not route AD
DS traffic through the NVAs used in this architecture.

Active Directory site

In AD DS, a site represents a physical location, network, or collection of devices. AD DS


sites are used to manage AD DS database replication by grouping together AD DS objects
that are located close to one another and are connected by a high speed network. AD DS
includes logic to select the best strategy for replacating the AD DS database between sites.

We recommend that you create an AD DS site including the subnets defined for your
application in Azure. Then, configure a site link between your on-premises AD DS sites, and
AD DS will automatically perform the most efficient database replication possible. Note that
this database replication requires little beyond the initial configuration.

Active Directory operations masters

The operations masters role can be assigned to AD DS domain controllers to support


consistency checking between instances of replicated AD DS databases. There are five
operations master roles: schema master, domain naming master, relative identifier master,
primary domain controller master emulator, and infrastructure master. For more information
about these roles, see What are Operations Masters?.

We recommend you do not assign operations masters roles to the domain controllers
deployed in Azure.

Monitoring

Monitor the resources of the domain controller VMs as well as the AD DS Services and
create a plan to quickly correct any problems. For more information, see Monitoring Active
Directory. You can also install tools such as Microsoft Systems Center on the monitoring
server (see the architecture diagram) to help perform these tasks.

Scalability considerations
AD DS is designed for scalability. You don't need to configure a load balancer or traffic
controller to direct requests to AD DS domain controllers. The only scalability consideration
is to configure the VMs running AD DS with the correct size for your network load
requirements, monitor the load on the VMs, and scale up or down as necessary.

Availability considerations
Deploy the VMs running AD DS into an availability set. Also, consider assigning the role of
standby operations master to at least one server, and possibly more depending on your
requirements. A standby operations master is an active copy of the operations master that can
be used in place of the primary operations masters server during fail over.
Manageability considerations
Perform regular AD DS backups. Don't simply copy the VHD files of domain controllers
instead of performing regular backups, because the AD DS database file on the VHD may not
be in a consistent state when it's copied, making it impossible to restart the database.

Do not shut down a domain controller VM using Azure portal. Instead, shut down and restart
from the guest operating system. Shutting down through the portal causes the VM to be
deallocated, which resets both the VM-GenerationID and the invocationID of the Active
Directory repository. This discards the AD DS relative identifier (RID) pool and marks
SYSVOL as nonauthoritative, and may require reconfiguration of the domain controller.

Security considerations
AD DS servers provide authentication services and are an attractive target for attacks. To
secure them, prevent direct Internet connectivity by placing the AD DS servers in a separate
subnet with an NSG acting as a firewall. Close all ports on the AD DS servers except those
necessary for authentication, authorization, and server synchronization. For more
information, see Active Directory and Active Directory Domain Services Port Requirements.

Consider implementing an additional security perimeter around servers with a pair of subnets
and NVAs, as described in Implementing a secure hybrid network architecture with Internet
access in Azure.

Use either BitLocker or Azure disk encryption to encrypt the disk hosting the AD DS
database.

Deploy the solution


A deployment for this architecture is available on GitHub. Note that the entire deployment
can take up to two hours, which includes creating the VPN gateway and running the scripts
that configure AD DS.

Prerequisites

1. Clone, fork, or download the zip file for the reference architectures GitHub
repository.
2. Install Azure CLI 2.0.
3. Install the Azure building blocks npm package.

bash

 npm install -g @mspnp/azure-building-blocks

 From a command prompt, bash prompt, or PowerShell prompt, sign into your Azure
account as follows:

bash
4. az login
5.

Deploy the simulated on-premises datacenter

1. Navigate to the identity/adds-extend-domain folder of the GitHub repository.


2. Open the onprem.json file. Search for instances of adminPassword and Password
and add values for the passwords.
3. Run the following command and wait for the deployment to finish:

bash

3. azbb -s <subscription_id> -g <resource group> -l <location> -p


onprem.json --deploy
4.

Deploy the Azure VNet

1. Open the azure.json file. Search for instances of adminPassword and Password and
add values for the passwords.
2. In the same file, search for instances of sharedKey and enter shared keys for the VPN
connection.

bash

 "sharedKey": "",

 Run the following command and wait for the deployment to finish.

bash
3. azbb -s <subscription_id> -g <resource group> -l <location> -p
onoprem.json --deploy

4. Deploy to the same resource group as the on-premises VNet.

Test connectivity with the Azure VNet

After deployment completes, you can test conectivity from the simulated on-premises
environment to the Azure VNet.

1. Use the Azure portal, navigate to the resource group that you created.
2. Find the VM named ra-onpremise-mgmt-vm1.
3. Click Connect to open a remote desktop session to the VM. The username is
contoso\testuser, and the password is the one that you specified in the
onprem.json parameter file.
4. From inside your remote desktop session, open another remote desktop session to
10.0.4.4, which is the IP address of the VM named adds-vm1. The username is
contoso\testuser, and the password is the one that you specified in the
azure.json parameter file.
5. From inside the remote desktop session for adds-vm1, go to Server Manager and
click Add other servers to manage.
6. In the Active Directory tab, click Find now. You should see a list of the AD, AD
DS, and Web VMs.
Create an Active Directory Domain
Services (AD DS) resource forest in Azure
This reference architecture shows how to create a separate Active Directory domain in Azure
that is trusted by domains in your on-premises AD forest.

Active Directory Domain Services (AD DS) stores identity information in a hierarchical
structure. The top node in the hierarchical structure is known as a forest. A forest contains
domains, and domains contain other types of objects. This reference architecture creates an
AD DS forest in Azure with a one-way outgoing trust relationship with an on-premises
domain. The forest in Azure contains a domain that does not exist on-premises. Because of
the trust relationship, logons made against on-premises domains can be trusted for access to
resources in the separate Azure domain.

Typical uses for this architecture include maintaining security separation for objects and
identities held in the cloud, and migrating individual domains from on-premises to the cloud.

For additional considerations, see Choose a solution for integrating on-premises Active
Directory with Azure.

Architecture
The architecture has the following components.

 On-premises network. The on-premises network contains its own Active Directory forest
and domains.
 Active Directory servers. These are domain controllers implementing domain services
running as VMs in the cloud. These servers host a forest containing one or more domains,
separate from those located on-premises.
 One-way trust relationship. The example in the diagram shows a one-way trust from the
domain in Azure to the on-premises domain. This relationship enables on-premises users to
access resources in the domain in Azure, but not the other way around. It is possible to
create a two-way trust if cloud users also require access to on-premises resources.
 Active Directory subnet. The AD DS servers are hosted in a separate subnet. Network
security group (NSG) rules protect the AD DS servers and provide a firewall against traffic
from unexpected sources.
 Azure gateway. The Azure gateway provides a connection between the on-premises
network and the Azure VNet. This can be a VPN connection or Azure ExpressRoute. For more
information, see Implementing a secure hybrid network architecture in Azure.

Recommendations
For specific recommendations on implementing Active Directory in Azure, see the following
articles:

 Extending Active Directory Domain Services (AD DS) to Azure.


 Guidelines for Deploying Windows Server Active Directory on Azure Virtual Machines .

Trust

The on-premises domains are contained within a different forest from the domains in the
cloud. To enable authentication of on-premises users in the cloud, the domains in Azure must
trust the logon domain in the on-premises forest. Similarly, if the cloud provides a logon
domain for external users, it may be necessary for the on-premises forest to trust the cloud
domain.

You can establish trusts at the forest level by creating forest trusts, or at the domain level by
creating external trusts. A forest level trust creates a relationship between all domains in two
forests. An external domain level trust only creates a relationship between two specified
domains. You should only create external domain level trusts between domains in different
forests.

Trusts can be unidirectional (one-way) or bidirectional (two-way):

 A one-way trust enables users in one domain or forest (known as the incoming domain or
forest) to access the resources held in another (the outgoing domain or forest).
 A two-way trust enables users in either domain or forest to access resources held in the
other.

The following table summarizes trust configurations for some simple scenarios:

Scenario On-premises trust Cloud trust

On-premises users require access to resources in the


One-way, incoming One-way, outgoing
cloud, but not vice versa

Users in the cloud require access to resources located


One-way, outgoing One-way, incoming
on-premises, but not vice versa
Scenario On-premises trust Cloud trust

Users in the cloud and on-premises both requires access Two-way, incoming Two-way, incoming
to resources held in the cloud and on-premises and outgoing and outgoing

Scalability considerations
Active Directory is automatically scalable for domain controllers that are part of the same
domain. Requests are distributed across all controllers within a domain. You can add another
domain controller, and it synchronizes automatically with the domain. Do not configure a
separate load balancer to direct traffic to controllers within the domain. Ensure that all
domain controllers have sufficient memory and storage resources to handle the domain
database. Make all domain controller VMs the same size.

Availability considerations
Provision at least two domain controllers for each domain. This enables automatic replication
between servers. Create an availability set for the VMs acting as Active Directory servers
handling each domain. Put at least two servers in this availability set.

Also, consider designating one or more servers in each domain as standby operations masters
in case connectivity to a server acting as a flexible single master operation (FSMO) role fails.

Manageability considerations
For information about management and monitoring considerations, see Extending Active
Directory to Azure.

For additional information, see Monitoring Active Directory. You can install tools such as
Microsoft Systems Center on a monitoring server in the management subnet to help perform
these tasks.

Security considerations
Forest level trusts are transitive. If you establish a forest level trust between an on-premises
forest and a forest in the cloud, this trust is extended to other new domains created in either
forest. If you use domains to provide separation for security purposes, consider creating trusts
at the domain level only. Domain level trusts are non-transitive.

For Active Directory-specific security considerations, see the security considerations section
in Extending Active Directory to Azure.

Deploy the solution


A deployment for this architecture is available on GitHub. Note that the entire deployment
can take up to two hours, which includes creating the VPN gateway and running the scripts
that configure AD DS.
Prerequisites

1. Clone, fork, or download the zip file for the reference architectures GitHub
repository.
2. Install Azure CLI 2.0.
3. Install the Azure building blocks npm package.

bash

 npm install -g @mspnp/azure-building-blocks

 From a command prompt, bash prompt, or PowerShell prompt, sign into your Azure
account as follows:

bash
4. az login
5.

Deploy the simulated on-premises datacenter

1. Navigate to the identity/adds-forest folder of the GitHub repository.


2. Open the onprem.json file. Search for instances of adminPassword and Password
and add values for the passwords.
3. Run the following command and wait for the deployment to finish:

bash

3. azbb -s <subscription_id> -g <resource group> -l <location> -p


onprem.json --deploy
4.

Deploy the Azure VNet

1. Open the azure.json file. Search for instances of adminPassword and Password and
add values for the passwords.
2. In the same file, search for instances of sharedKey and enter shared keys for the VPN
connection.

bash

 "sharedKey": "",

 Run the following command and wait for the deployment to finish.

bash
3. azbb -s <subscription_id> -g <resource group> -l <location> -p
onoprem.json --deploy

4. Deploy to the same resource group as the on-premises VNet.


Test the AD trust relation

1. Use the Azure portal, navigate to the resource group that you created.
2. Use the Azure portal to find the VM named ra-adt-mgmt-vm1.
3. Click Connect to open a remote desktop session to the VM. The username is
contoso\testuser, and the password is the one that you specified in the
onprem.json parameter file.
4. From inside your remote desktop session, open another remote desktop session to
192.168.0.4, which is the IP address of the VM named ra-adtrust-onpremise-ad-
vm1. The username is contoso\testuser, and the password is the one that you
specified in the azure.json parameter file.
5. From inside the remote desktop session for ra-adtrust-onpremise-ad-vm1, go to
Server Manager and click Tools > Active Directory Domains and Trusts.
6. In the left pane, right-click on the contoso.com and select Properties.
7. Click the Trusts tab. You should see treyresearch.net listed as an incoming trust.
Extend Active Directory Federation
Services (AD FS) to Azure
This reference architecture implements a secure hybrid network that extends your on-
premises network to Azure and uses Active Directory Federation Services (AD FS) to
perform federated authentication and authorization for components running in Azure.

AD FS can be hosted on-premises, but if your application is a hybrid in which some parts are
implemented in Azure, it may be more efficient to replicate AD FS in the cloud.

The diagram shows the following scenarios:

 Application code from a partner organization accesses a web application hosted inside your
Azure VNet.
 An external, registered user with credentials stored inside Active Directory Domain Services
(DS) accesses a web application hosted inside your Azure VNet.
 A user connected to your VNet using an authorized device executes a web application
hosted inside your Azure VNet.

Typical uses for this architecture include:

 Hybrid applications where workloads run partly on-premises and partly in Azure.
 Solutions that use federated authorization to expose web applications to partner
organizations.
 Systems that support access from web browsers running outside of the organizational
firewall.
 Systems that enable users to access to web applications by connecting from authorized
external devices such as remote computers, notebooks, and other mobile devices.

This reference architecture focuses on passive federation, in which the federation servers
decide how and when to authenticate a user. The user provides sign in information when the
application is started. This mechanism is most commonly used by web browsers and involves
a protocol that redirects the browser to a site where the user authenticates. AD FS also
supports active federation, where an application takes on responsibility for supplying
credentials without further user interaction, but that scenario is outside the scope of this
architecture.

Architecture
This architecture extends the implementation described in Extending AD DS to Azure. It
contains the followign components.

 AD DS subnet. The AD DS servers are contained in their own subnet with network
security group (NSG) rules acting as a firewall.
 AD DS servers. Domain controllers running as VMs in Azure. These servers provide
authentication of local identities within the domain.
 AD FS subnet. The AD FS servers are located within their own subnet with NSG
rules acting as a firewall.
 AD FS servers. The AD FS servers provide federated authorization and
authentication. In this architecture, they perform the following tasks:
o Receiving security tokens containing claims made by a partner federation
server on behalf of a partner user. AD FS verifies that the tokens are valid
before passing the claims to the web application running in Azure to authorize
requests.

The web application running in Azure is the relying party. The partner
federation server must issue claims that are understood by the web application.
The partner federation servers are referred to as account partners, because
they submit access requests on behalf of authenticated accounts in the partner
organization. The AD FS servers are called resource partners because they
provide access to resources (the web application).

o Authenticating and authorizing incoming requests from external users running


a web browser or device that needs access to web applications, by using AD
DS and the Active Directory Device Registration Service.

The AD FS servers are configured as a farm accessed through an Azure load balancer.
This implementation improves availability and scalability. The AD FS servers are not
exposed directly to the Internet. All Internet traffic is filtered through AD FS web
application proxy servers and a DMZ (also referred to as a perimeter network).

For more information about how AD FS works, see Active Directory Federation
Services Overview. Also, the article AD FS deployment in Azure contains a detailed
step-by-step introduction to implementation.
 AD FS proxy subnet. The AD FS proxy servers can be contained within their own
subnet, with NSG rules providing protection. The servers in this subnet are exposed to
the Internet through a set of network virtual appliances that provide a firewall
between your Azure virtual network and the Internet.
 AD FS web application proxy (WAP) servers. These VMs act as AD FS servers for
incoming requests from partner organizations and external devices. The WAP servers
act as a filter, shielding the AD FS servers from direct access from the Internet. As
with the AD FS servers, deploying the WAP servers in a farm with load balancing
gives you greater availability and scalability than deploying a collection of stand-
alone servers.

Note

For detailed information about installing WAP servers, see Install and Configure the
Web Application Proxy Server

 Partner organization. A partner organization running a web application that requests


access to a web application running in Azure. The federation server at the partner
organization authenticates requests locally, and submits security tokens containing
claims to AD FS running in Azure. AD FS in Azure validates the security tokens, and
if valid can pass the claims to the web application running in Azure to authorize them.

Note

You can also configure a VPN tunnel using Azure gateway to provide direct access to
AD FS for trusted partners. Requests received from these partners do not pass through
the WAP servers.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

VM recommendations

Create VMs with sufficient resources to handle the expected volume of traffic. Use the size of
the existing machines hosting AD FS on premises as a starting point. Monitor the resource
utilization. You can resize the VMs and scale down if they are too large.

Follow the recommendations listed in Running a Windows VM on Azure.

Networking recommendations

Configure the network interface for each of the VMs hosting AD FS and WAP servers with
static private IP addresses.

Do not give the AD FS VMs public IP addresses. For more information, see the Security
considerations section.
Set the IP address of the preferred and secondary domain name service (DNS) servers for the
network interfaces for each AD FS and WAP VM to reference the Active Directory DS VMs.
The Active Directory DS VMS should be running DNS. This step is necessary to enable each
VM to join the domain.

AD FS availability

Create an AD FS farm with at least two servers to increase availability of the service. Use
different storage accounts for each AD FS VM in the farm. This approach helps to ensure that
a failure in a single storage account does not make the entire farm inaccessible.

Important

We recommend the use of managed disks. Managed disks do not require a storage account.
You simply specify the size and type of disk and it is deployed in a highly available way. Our
reference architectures do not currently deploy managed disks but the template building
blocks will be updated to deploy managed disks in version 2.

Create separate Azure availability sets for the AD FS and WAP VMs. Ensure that there are at
least two VMs in each set. Each availability set must have at least two update domains and
two fault domains.

Configure the load balancers for the AD FS VMs and WAP VMs as follows:

 Use an Azure load balancer to provide external access to the WAP VMs, and an
internal load balancer to distribute the load across the AD FS servers in the farm.
 Only pass traffic appearing on port 443 (HTTPS) to the AD FS/WAP servers.
 Give the load balancer a static IP address.
 Create a health probe using HTTP against /adfs/probe. For more information, see
Hardware Load Balancer Health Checks and Web Application Proxy / AD FS 2012
R2.

Note

AD FS servers use the Server Name Indication (SNI) protocol, so attempting to probe
using an HTTPS endpoint from the load balancer fails.

 Add a DNS A record to the domain for the AD FS load balancer. Specify the IP
address of the load balancer, and give it a name in the domain (such as
adfs.contoso.com). This is the name clients and the WAP servers use to access the AD
FS server farm.

AD FS security

Prevent direct exposure of the AD FS servers to the Internet. AD FS servers are domain-
joined computers that have full authorization to grant security tokens. If a server is
compromised, a malicious user can issue full access tokens to all web applications and to all
federation servers that are protected by AD FS. If your system must handle requests from
external users not connecting from trusted partner sites, use WAP servers to handle these
requests. For more information, see Where to Place a Federation Server Proxy.
Place AD FS servers and WAP servers in separate subnets with their own firewalls. You can
use NSG rules to define firewall rules. If you require more comprehensive protection you can
implement an additional security perimeter around servers by using a pair of subnets and
network virtual appliances (NVAs), as described in the document Implementing a secure
hybrid network architecture with Internet access in Azure. All firewalls should allow traffic
on port 443 (HTTPS).

Restrict direct sign in access to the AD FS and WAP servers. Only DevOps staff should be
able to connect.

Do not join the WAP servers to the domain.

AD FS installation

The article Deploying a Federation Server Farm provides detailed instructions for installing
and configuring AD FS. Perform the following tasks before configuring the first AD FS
server in the farm:

1. Obtain a publicly trusted certificate for performing server authentication. The subject
name must contain the name clients use to access the federation service. This can be
the DNS name registered for the load balancer, for example, adfs.contoso.com (avoid
using wildcard names such as *.contoso.com, for security reasons). Use the same
certificate on all AD FS server VMs. You can purchase a certificate from a trusted
certification authority, but if your organization uses Active Directory Certificate
Services you can create your own.

The subject alternative name is used by the device registration service (DRS) to
enable access from external devices. This should be of the form
enterpriseregistration.contoso.com.

For more information, see Obtain and Configure a Secure Sockets Layer (SSL)
Certificate for AD FS.

2. On the domain controller, generate a new root key for the Key Distribution Service.
Set the effective time to the current time minus 10 hours (this configuration reduces
the delay that can occur in distributing and synchronizing keys across the domain).
This step is necessary to support creating the group service account that is used to run
the AD FS service. The following PowerShell command shows an example of how to
do this:

PowerShell

2. Add-KdsRootKey -EffectiveTime (Get-Date).AddHours(-10)


3.

4. Add each AD FS server VM to the domain.

Note
To install AD FS, the domain controller running the primary domain controller (PDC)
emulator flexible single master operation (FSMO) role for the domain must be running and
accessible from the AD FS VMs. <<RBC: Is there a way to make this less repetitive?>>

AD FS trust

Establish federation trust between your AD FS installation, and the federation servers of any
partner organizations. Configure any claims filtering and mapping required.

 DevOps staff at each partner organization must add a relying party trust for the web
applications accessible through your AD FS servers.
 DevOps staff in your organization must configure claims-provider trust to enable your AD FS
servers to trust the claims that partner organizations provide.
 DevOps staff in your organization must also configure AD FS to pass claims on to your
organization's web applications.

For more information, see Establishing Federation Trust.

Publish your organization's web applications and make them available to external partners by
using preauthentication through the WAP servers. For more information, see Publish
Applications using AD FS Preauthentication

AD FS supports token transformation and augmentation. Azure Active Directory does not
provide this feature. With AD FS, when you set up the trust relationships, you can:

 Configure claim transformations for authorization rules. For example, you can map group
security from a representation used by a non-Microsoft partner organization to something
that that Active Directory DS can authorize in your organization.
 Transform claims from one format to another. For example, you can map from SAML 2.0 to
SAML 1.1 if your application only supports SAML 1.1 claims.

AD FS monitoring

The Microsoft System Center Management Pack for Active Directory Federation Services
2012 R2 provides both proactive and reactive monitoring of your AD FS deployment for the
federation server. This management pack monitors:

 Events that the AD FS service records in its event logs.


 The performance data that the AD FS performance counters collect.
 The overall health of the AD FS system and web applications (relying parties), and provides
alerts for critical issues and warnings.

Scalability considerations
The following considerations, summarized from the article Plan your AD FS deployment,
give a starting point for sizing AD FS farms:

 If you have fewer than 1000 users, do not create dedicated servers, but instead install AD FS
on each of the Active Directory DS servers in the cloud. Make sure that you have at least two
Active Directory DS servers to maintain availability. Create a single WAP server.
 If you have between 1000 and 15000 users, create two dedicated AD FS servers and two
dedicated WAP servers.
 If you have between 15000 and 60000 users, create between three and five dedicated AD FS
servers and at least two dedicated WAP servers.

These considerations assume that you are using dual quad-core VM (Standard D4_v2, or
better) sizes in Azure.

If you are using the Windows Internal Database to store AD FS configuration data, you are
limited to eight AD FS servers in the farm. If you anticipate that you will need more in the
future, use SQL Server. For more information, see The Role of the AD FS Configuration
Database.

Availability considerations
You can use either SQL Server or the Windows Internal Database to hold AD FS
configuration information. The Windows Internal Database provides basic redundancy.
Changes are written directly to only one of the AD FS databases in the AD FS cluster, while
the other servers use pull replication to keep their databases up to date. Using SQL Server can
provide full database redundancy and high availability using failover clustering or mirroring.

Manageability considerations
DevOps staff should be prepared to perform the following tasks:

 Managing the federation servers, including managing the AD FS farm, managing trust policy
on the federation servers, and managing the certificates used by the federation services.
 Managing the WAP servers including managing the WAP farm and certificates.
 Managing web applications including configuring relying parties, authentication methods,
and claims mappings.
 Backing up AD FS components.

Security considerations
AD FS utilizes the HTTPS protocol, so make sure that the NSG rules for the subnet
containing the web tier VMs permit HTTPS requests. These requests can originate from the
on-premises network, the subnets containing the web tier, business tier, data tier, private
DMZ, public DMZ, and the subnet containing the AD FS servers.

Consider using a set of network virtual appliances that logs detailed information on traffic
traversing the edge of your virtual network for auditing purposes.

Deploy the solution


A solution is available on GitHub to deploy this reference architecture. You will need the
latest version of the Azure CLI to run the Powershell script that deploys the solution. To
deploy the reference architecture, follow these steps:
1. Download or clone the solution folder from GitHub to your local machine.
2. Open the Azure CLI and navigate to the local solution folder.
3. Run the following command:

PowerShell

 .\Deploy-ReferenceArchitecture.ps1 <subscription id> <location> <mode>

Replace <subscription id> with your Azure subscription ID.

For <location>, specify an Azure region, such as eastus or westus.

The <mode> parameter controls the granularity of the deployment, and can be one of the
following values:

 Onpremise: Deploys a simulated on-premises environment. You can use this deployment to
test and experiment if you do not have an existing on-premises network, or if you want to
test this reference architecture without changing the configuration of your existing on-
premises network.
 Infrastructure: deploys the VNet infrastructure and jump box.
 CreateVpn: deploys an Azure virtual network gateway and connects it to the simulated on-
premises network.
 AzureADDS: deploys the VMs acting as Active Directory DS servers, deploys Active Directory
to these VMs, and creates the domain in Azure.
 AdfsVm: deploys the AD FS VMs and joins them to the domain in Azure.
 PublicDMZ: deploys the public DMZ in Azure.
 ProxyVm: deploys the AD FS proxy VMs and joins them to the domain in Azure.
 Prepare: deploys all of the preceding deployments. This is the recommended option if you
are building an entirely new deployment and you don't have an existing on-premises
infrastructure.
 Workload: optionally deploys web, business, and data tier VMs and supporting network.
Not included in the Prepare deployment mode.
 PrivateDMZ: optionally deploys the private DMZ in Azure in front of the Workload VMs
deployed above. Not included in the Prepare deployment mode.

 Wait for the deployment to complete. If you used the Prepare option, the deployment
takes several hours to complete, and finishes with the message Preparation is
completed. Please install certificate to all AD FS and proxy VMs.

 Restart the jump box (ra-adfs-mgmt-vm1 in the ra-adfs-security-rg group) to allow its
DNS settings to take effect.

 Obtain an SSL Certificate for AD FS and install this certificate on the AD FS VMs. Note
that you can connect to them through the jump box. The IP addresses are 10.0.5.4 and
10.0.5.5. The default username is contoso\testuser with password AweSome@PW.

Note

The comments in the Deploy-ReferenceArchitecture.ps1 script at this point provides detailed


instructions for creating a self-signed test certificate and authority using the makecert
command. However, perform these steps as a test only and do not use the certificates
generated by makecert in a production environment.

 Run the following PowerShell command to deploy the AD FS server farm:

PowerShell
 .\Deploy-ReferenceArchitecture.ps1 <subscription id> <location> Adfs

 On the jump box, browse to


https://adfs.contoso.com/adfs/ls/idpinitiatedsignon.htm to test the AD FS
installation (you may receive a certificate warning that you can ignore for this test). Verify
that the Contoso Corporation sign-in page appears. Sign in as contoso\testuser with password
AweS0me@PW.

 Install the SSL certificate on the AD FS proxy VMs. The IP addresses are 10.0.6.4 and
10.0.6.5.

 Run the following PowerShell command to deploy the first AD FS proxy server:

PowerShell
 .\Deploy-ReferenceArchitecture.ps1 <subscription id> <location> Proxy1

 Follow the instructions displayed by the script to test the installation of the first proxy
server.

 Run the following PowerShell command to deploy the second proxy server:

PowerShell
 .\Deploy-ReferenceArchitecture.ps1 <subscription id> <location> Proxy2

 Follow the instructions displayed by the script to test the complete proxy configuration.
N-tier application with SQL Server
This reference architecture shows how to deploy VMs and a virtual network configured for
an N-tier application, using SQL Server on Windows for the data tier.

Architecture
The architecture has the following components:

 Resource group. Resource groups are used to group resources so they can be
managed by lifetime, owner, or other criteria.
 Virtual network (VNet) and subnets. Every Azure VM is deployed into a VNet that
can be segmented into multiple subnets. Create a separate subnet for each tier.
 Application gateway. Azure Application Gateway is a layer 7 load balancer. In this
architecture, it routes HTTP requests to the web front end. Application Gateway also
provides a web application firewall (WAF) that protects the application from common
exploits and vulnerabilities.
 NSGs. Use network security groups (NSGs) to restrict network traffic within the
VNet. For example, in the 3-tier architecture shown here, the database tier does not
accept traffic from the web front end, only from the business tier and the management
subnet.
 Virtual machines. For recommendations on configuring VMs, see Run a Windows
VM on Azure and Run a Linux VM on Azure.
 Availability sets. Create an availability set for each tier, and provision at least two
VMs in each tier. This makes the VMs eligible for a higher service level agreement
(SLA) for VMs.
 VM scale set (not shown). A VM scale set is an alternative to using an availability
set. A scale sets makes it easy to scale out the VMs in a tier, either manually or
automatically based on predefined rules.
 Load balancers. Use Azure Load Balancer to distribute network traffic from the web
tier to the business tier, and from the business tier to SQL Server.
 Public IP address. A public IP address is needed for the application to receive
Internet traffic.
 Jumpbox. Also called a bastion host. A secure VM on the network that
administrators use to connect to the other VMs. The jumpbox has an NSG that allows
remote traffic only from public IP addresses on a safe list. The NSG should permit
remote desktop (RDP) traffic.
 SQL Server Always On Availability Group. Provides high availability at the data
tier, by enabling replication and failover. It uses Windows Server Failover Cluster
(WSFC) technology for failover.
 Active Directory Domain Services (AD DS) Servers. The computer objects for the
failover cluster and its associated clustered roles are created in Active Directory
Domain Services (AD DS).
 Cloud Witness. A failover cluster requires more than half of its nodes to be running,
which is known as having quorum. If the cluster has just two nodes, a network
partition could cause each node to think it's the master node. In that case, you need a
witness to break ties and establish quorum. A witness is a resource such as a shared
disk that can act as a tie breaker to establish quorum. Cloud Witness is a type of
witness that uses Azure Blob Storage. To learn more about the concept of quorum, see
Understanding cluster and pool quorum. For more information about Cloud Witness,
see Deploy a Cloud Witness for a Failover Cluster.
 Azure DNS. Azure DNS is a hosting service for DNS domains, providing name
resolution using Microsoft Azure infrastructure. By hosting your domains in Azure,
you can manage your DNS records using the same credentials, APIs, tools, and billing
as your other Azure services.

Recommendations
Your requirements might differ from the architecture described here. Use these
recommendations as a starting point.

VNet / Subnets

When you create the VNet, determine how many IP addresses your resources in each subnet
require. Specify a subnet mask and a VNet address range large enough for the required IP
addresses, using CIDR notation. Use an address space that falls within the standard private IP
address blocks, which are 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16.

Choose an address range that does not overlap with your on-premises network, in case you
need to set up a gateway between the VNet and your on-premise network later. Once you
create the VNet, you can't change the address range.

Design subnets with functionality and security requirements in mind. All VMs within the
same tier or role should go into the same subnet, which can be a security boundary. For more
information about designing VNets and subnets, see Plan and design Azure Virtual Networks.

Load balancers

Do not expose the VMs directly to the Internet, but instead give each VM a private IP
address. Clients connect using the public IP address associated with the Application Gateway.
Define load balancer rules to direct network traffic to the VMs. For example, to enable HTTP
traffic, create a rule that maps port 80 from the front-end configuration to port 80 on the
back-end address pool. When a client sends an HTTP request to port 80, the load balancer
selects a back-end IP address by using a hashing algorithm that includes the source IP
address. In that way, client requests are distributed across all the VMs.

Network security groups

Use NSG rules to restrict traffic between tiers. For example, in the 3-tier architecture shown
above, the web tier does not communicate directly with the database tier. To enforce this, the
database tier should block incoming traffic from the web tier subnet.

1. Deny all inbound traffic from the VNet. (Use the VIRTUAL_NETWORK tag in the rule.)
2. Allow inbound traffic from the business tier subnet.
3. Allow inbound traffic from the database tier subnet itself. This rule allows communication
between the database VMs, which is needed for database replication and failover.
4. Allow RDP traffic (port 3389) from the jumpbox subnet. This rule lets administrators connect
to the database tier from the jumpbox.

Create rules 2 – 4 with higher priority than the first rule, so they override it.

SQL Server Always On Availability Groups

We recommend Always On Availability Groups for SQL Server high availability. Prior to
Windows Server 2016, Always On Availability Groups require a domain controller, and all
nodes in the availability group must be in the same AD domain.

Other tiers connect to the database through an availability group listener. The listener enables
a SQL client to connect without knowing the name of the physical instance of SQL Server.
VMs that access the database must be joined to the domain. The client (in this case, another
tier) uses DNS to resolve the listener's virtual network name into IP addresses.

Configure the SQL Server Always On Availability Group as follows:

1. Create a Windows Server Failover Clustering (WSFC) cluster, a SQL Server Always
On Availability Group, and a primary replica. For more information, see Getting
Started with Always On Availability Groups.
2. Create an internal load balancer with a static private IP address.
3. Create an availability group listener, and map the listener's DNS name to the IP
address of an internal load balancer.
4. Create a load balancer rule for the SQL Server listening port (TCP port 1433 by
default). The load balancer rule must enable floating IP, also called Direct Server
Return. This causes the VM to reply directly to the client, which enables a direct
connection to the primary replica.

Note

When floating IP is enabled, the front-end port number must be the same as the back-
end port number in the load balancer rule.
When a SQL client tries to connect, the load balancer routes the connection request to the
primary replica. If there is a failover to another replica, the load balancer automatically routes
subsequent requests to a new primary replica. For more information, see Configure an ILB
listener for SQL Server Always On Availability Groups.

During a failover, existing client connections are closed. After the failover completes, new
connections will be routed to the new primary replica.

If your application makes significantly more reads than writes, you can offload some of the
read-only queries to a secondary replica. See Using a Listener to Connect to a Read-Only
Secondary Replica (Read-Only Routing).

Test your deployment by forcing a manual failover of the availability group.

Jumpbox

Do not allow RDP access from the public Internet to the VMs that run the application
workload. Instead, all RDP access to these VMs must come through the jumpbox. An
administrator logs into the jumpbox, and then logs into the other VM from the jumpbox. The
jumpbox allows RDP traffic from the Internet, but only from known, safe IP addresses.

The jumpbox has minimal performance requirements, so select a small VM size. Create a
public IP address for the jumpbox. Place the jumpbox in the same VNet as the other VMs,
but in a separate management subnet.

To secure the jumpbox, add an NSG rule that allows RDP connections only from a safe set of
public IP addresses. Configure the NSGs for the other subnets to allow RDP traffic from the
management subnet.

Scalability considerations
VM scale sets help you to deploy and manage a set of identical VMs. Scale sets support
autoscaling based on performance metrics. As the load on the VMs increases, additional VMs
are automatically added to the load balancer. Consider scale sets if you need to quickly scale
out VMs, or need to autoscale.

There are two basic ways to configure VMs deployed in a scale set:

 Use extensions to configure the VM after it is provisioned. With this approach, new
VM instances may take longer to start up than a VM with no extensions.
 Deploy a managed disk with a custom disk image. This option may be quicker to
deploy. However, it requires you to keep the image up to date.

For additional considerations, see Design considerations for scale sets.

Tip

When using any autoscale solution, test it with production-level workloads well in advance.
Each Azure subscription has default limits in place, including a maximum number of VMs
per region. You can increase the limit by filing a support request. For more information, see
Azure subscription and service limits, quotas, and constraints.

Availability considerations
If you are not using VM scale sets, put VMs in the same tier into an availability set. Create at
least two VMs in the availability set to support the availability SLA for Azure VMs. For
more information, see Manage the availability of virtual machines.

The load balancer uses health probes to monitor the availability of VM instances. If a probe
cannot reach an instance within a timeout period, the load balancer stops sending traffic to
that VM. However, the load balancer will continue to probe, and if the VM becomes
available again, the load balancer resumes sending traffic to that VM.

Here are some recommendations on load balancer health probes:

 Probes can test either HTTP or TCP. If your VMs run an HTTP server, create an HTTP probe.
Otherwise create a TCP probe.
 For an HTTP probe, specify the path to an HTTP endpoint. The probe checks for an HTTP 200
response from this path. This can be the root path ("/"), or a health-monitoring endpoint
that implements some custom logic to check the health of the application. The endpoint
must allow anonymous HTTP requests.
 The probe is sent from a known IP address, 168.63.129.16. Make sure you don't block traffic
to or from this IP address in any firewall policies or network security group (NSG) rules.
 Use health probe logs to view the status of the health probes. Enable logging in the Azure
portal for each load balancer. Logs are written to Azure Blob storage. The logs show how
many VMs on the back end are not receiving network traffic due to failed probe responses.

If you need higher availability than the Azure SLA for VMs provides, consider replication
the application across two regions, using Azure Traffic Manager for failover. For more
information, see Multi-region N-tier application for high availability.

Security considerations
Virtual networks are a traffic isolation boundary in Azure. VMs in one VNet cannot
communicate directly with VMs in a different VNet. VMs within the same VNet can
communicate, unless you create network security groups (NSGs) to restrict traffic. For more
information, see Microsoft cloud services and network security.

Consider adding a network virtual appliance (NVA) to create a DMZ between the Internet
and the Azure virtual network. NVA is a generic term for a virtual appliance that can perform
network-related tasks, such as firewall, packet inspection, auditing, and custom routing. For
more information, see Implementing a DMZ between Azure and the Internet.

Encrypt sensitive data at rest and use Azure Key Vault to manage the database encryption
keys. Key Vault can store encryption keys in hardware security modules (HSMs). For more
information, see Configure Azure Key Vault Integration for SQL Server on Azure VMs. It's
also recommended to store application secrets, such as database connection strings, in Key
Vault.

We recommend enabling DDoS Protection Standard, which provides additional DDoS


mitigation for resources in a VNet. Although basic DDoS protection is automatically enabled
as part of the Azure platform, DDoS Protection Standard provides mitigation capabilities that
are tuned specifically to Azure Virtual Network resources.

Deploy the solution


A deployment for this reference architecture is available on GitHub. Note that the entire
deployment can take up to two hours, which includes running the scripts to configure AD DS,
the Windows Server failover cluster, and the SQL Server availability group.

Prerequisites

1. Clone, fork, or download the zip file for the reference architectures GitHub
repository.
2. Install Azure CLI 2.0.
3. Install the Azure building blocks npm package.

bash

 npm install -g @mspnp/azure-building-blocks

 From a command prompt, bash prompt, or PowerShell prompt, sign into your Azure
account as follows:

bash
4. az login
5.

Deploy the solution

1. Run the following command to create a resource group.

bash

 az group create --location <location> --name <resource-group-name>

 Run the following command to create a Storage account for the Cloud Witness.

bash
 az storage account create --location <location> \
--name <storage-account-name> \
--resource-group <resource-group-name> \
--sku Standard_LRS
 Navigate to the virtual-machines\n-tier-windows folder of the reference
architectures GitHub repository.

 Open the n-tier-windows.json file.

 Search for all instances of "witnessStorageBlobEndPoint" and replace the placeholder text
with the name of the Storage account from step 2.

JSON
 "witnessStorageBlobEndPoint": "https://[replace-with-
storageaccountname].blob.core.windows.net",

 Run the following command to list the account keys for the storage account.

bash
az storage account keys list \
--account-name <storage-account-name> \
--resource-group <resource-group-name>

The output should look like the following. Copy the value of key1.

JSON
 [
{
"keyName": "key1",
"permissions": "Full",
"value": "..."
},
{
"keyName": "key2",
"permissions": "Full",
"value": "..."
}
]

 In the n-tier-windows.json file, search for all instances of


"witnessStorageAccountKey" and paste in the account key.

JSON
 "witnessStorageAccountKey": "[replace-with-storagekey]"

 In the n-tier-windows.json file, search for all instances of [replace-with-password]


and [replace-with-sql-password] replace them with a strong password. Save the file.

Note

If you change the adminstrator user name, you must also update the extensions blocks in
the JSON file.

 Run the following command to deploy the architecture.


bash
9. azbb -s <your subscription_id> -g <resource_group_name> -l <location>
-p n-tier-windows.json --deploy
10.

For more information on deploying this sample reference architecture using Azure Building
Blocks, visit the GitHub repository.

Multi-region N-tier application for high


availability
This reference architecture shows a set of proven practices for running an N-tier application
in multiple Azure regions, in order to achieve availability and a robust disaster recovery
infrastructure.

Architecture
This architecture builds on the one shown in N-tier application with SQL Server.

 Primary and secondary regions. Use two regions to achieve higher availability. One
is the primary region. The other region is for failover.
 Azure Traffic Manager. Traffic Manager routes incoming requests to one of the
regions. During normal operations, it routes requests to the primary region. If that
region becomes unavailable, Traffic Manager fails over to the secondary region. For
more information, see the section Traffic Manager configuration.
 Resource groups. Create separate resource groups for the primary region, the
secondary region, and for Traffic Manager. This gives you the flexibility to manage
each region as a single collection of resources. For example, you could redeploy one
region, without taking down the other one. Link the resource groups, so that you can
run a query to list all the resources for the application.
 VNets. Create a separate VNet for each region. Make sure the address spaces do not
overlap.
 SQL Server Always On Availability Group. If you are using SQL Server, we
recommend SQL Always On Availability Groups for high availability. Create a single
availability group that includes the SQL Server instances in both regions.

Note

Also consider Azure SQL Database, which provides a relational database as a cloud
service. With SQL Database, you don't need to configure an availability group or
manage failover.

 VPN Gateways. Create a VPN gateway in each VNet, and configure a VNet-to-VNet
connection, to enable network traffic between the two VNets. This is required for the
SQL Always On Availability Group.

Recommendations
A multi-region architecture can provide higher availability than deploying to a single region.
If a regional outage affects the primary region, you can use Traffic Manager to fail over to the
secondary region. This architecture can also help if an individual subsystem of the application
fails.

There are several general approaches to achieving high availability across regions:

 Active/passive with hot standby. Traffic goes to one region, while the other waits on hot
standby. Hot standby means the VMs in the secondary region are allocated and running at
all times.
 Active/passive with cold standby. Traffic goes to one region, while the other waits on cold
standby. Cold standby means the VMs in the secondary region are not allocated until
needed for failover. This approach costs less to run, but will generally take longer to come
online during a failure.
 Active/active. Both regions are active, and requests are load balanced between them. If one
region becomes unavailable, it is taken out of rotation.

This reference architecture focuses on active/passive with hot standby, using Traffic Manager
for failover. Note that you could deploy a small number of VMs for hot standby and then
scale out as needed.
Regional pairing

Each Azure region is paired with another region within the same geography. In general,
choose regions from the same regional pair (for example, East US 2 and US Central).
Benefits of doing so include:

 If there is a broad outage, recovery of at least one region out of every pair is prioritized.
 Planned Azure system updates are rolled out to paired regions sequentially, to minimize
possible downtime.
 Pairs reside within the same geography, to meet data residency requirements.

However, make sure that both regions support all of the Azure services needed for your
application (see Services by region). For more information about regional pairs, see Business
continuity and disaster recovery (BCDR): Azure Paired Regions.

Traffic Manager configuration

Consider the following points when configuring Traffic Manager:

 Routing. Traffic Manager supports several routing algorithms. For the scenario described in
this article, use priority routing (formerly called failover routing). With this setting, Traffic
Manager sends all requests to the primary region, unless the primary region becomes
unreachable. At that point, it automatically fails over to the secondary region. See Configure
Failover routing method.
 Health probe. Traffic Manager uses an HTTP (or HTTPS) probe to monitor the availability of
each region. The probe checks for an HTTP 200 response for a specified URL path. As a best
practice, create an endpoint that reports the overall health of the application, and use this
endpoint for the health probe. Otherwise, the probe might report a healthy endpoint when
critical parts of the application are actually failing. For more information, see Health
Endpoint Monitoring Pattern.

When Traffic Manager fails over there is a period of time when clients cannot reach the
application. The duration is affected by the following factors:

 The health probe must detect that the primary region has become unreachable.
 DNS servers must update the cached DNS records for the IP address, which depends on the
DNS time-to-live (TTL). The default TTL is 300 seconds (5 minutes), but you can configure this
value when you create the Traffic Manager profile.

For details, see About Traffic Manager Monitoring.

If Traffic Manager fails over, we recommend performing a manual failback rather than
implementing an automatic failback. Otherwise, you can create a situation where the
application flips back and forth between regions. Verify that all application subsystems are
healthy before failing back.

Note that Traffic Manager automatically fails back by default. To prevent this, manually
lower the priority of the primary region after a failover event. For example, suppose the
primary region is priority 1 and the secondary is priority 2. After a failover, set the primary
region to priority 3, to prevent automatic failback. When you are ready to switch back, update
the priority to 1.

The following Azure CLI command updates the priority:

bat
az network traffic-manager endpoint update --resource-group <resource-
group> --profile-name <profile>
--name <endpoint-name> --type azureEndpoints --priority 3

Another approach is to temporarily disable the endpoint until you are ready to fail back:

bat
az network traffic-manager endpoint update --resource-group <resource-
group> --profile-name <profile>
--name <endpoint-name> --type azureEndpoints --endpoint-status Disabled

Depending on the cause of a failover, you might need to redeploy the resources within a
region. Before failing back, perform an operational readiness test. The test should verify
things like:

 VMs are configured correctly. (All required software is installed, IIS is running, and so on.)
 Application subsystems are healthy.
 Functional testing. (For example, the database tier is reachable from the web tier.)

Configure SQL Server Always On Availability Groups

Prior to Windows Server 2016, SQL Server Always On Availability Groups require a domain
controller, and all nodes in the availability group must be in the same Active Directory (AD)
domain.

To configure the availability group:

 At a minimum, place two domain controllers in each region.


 Give each domain controller a static IP address.
 Create a VNet-to-VNet connection to enable communication between the VNets.
 For each VNet, add the IP addresses of the domain controllers (from both regions) to
the DNS server list. You can use the following CLI command. For more information,
see Change DNS servers.

bat

 az network vnet update --resource-group <resource-group> --name


<vnet-name> --dns-servers "10.0.0.4,10.0.0.6,172.16.0.4,172.16.0.6"

 Create a Windows Server Failover Clustering (WSFC) cluster that includes the SQL
Server instances in both regions.
 Create a SQL Server Always On Availability Group that includes the SQL Server
instances in both the primary and secondary regions. See Extending Always On
Availability Group to Remote Azure Datacenter (PowerShell) for the steps.
o Put the primary replica in the primary region.
o Put one or more secondary replicas in the primary region. Configure these to
use synchronous commit with automatic failover.
o Put one or more secondary replicas in the secondary region. Configure these to
use asynchronous commit, for performance reasons. (Otherwise, all T-SQL
transactions have to wait on a round trip over the network to the secondary
region.)

Note

Asynchronous commit replicas do not support automatic failover.

Availability considerations
With a complex N-tier app, you may not need to replicate the entire application in the
secondary region. Instead, you might just replicate a critical subsystem that is needed to
support business continuity.

Traffic Manager is a possible failure point in the system. If the Traffic Manager service fails,
clients cannot access your application during the downtime. Review the Traffic Manager
SLA, and determine whether using Traffic Manager alone meets your business requirements
for high availability. If not, consider adding another traffic management solution as a
failback. If the Azure Traffic Manager service fails, change your CNAME records in DNS to
point to the other traffic management service. (This step must be performed manually, and
your application will be unavailable until the DNS changes are propagated.)

For the SQL Server cluster, there are two failover scenarios to consider:

 All of the SQL Server database replicas in the primary region fail. For example, this
could happen during a regional outage. In that case, you must manually fail over the
availability group, even though Traffic Manager automatically fails over on the front
end. Follow the steps in Perform a Forced Manual Failover of a SQL Server
Availability Group, which describes how to perform a forced failover by using SQL
Server Management Studio, Transact-SQL, or PowerShell in SQL Server 2016.

Warning

With forced failover, there is a risk of data loss. Once the primary region is back
online, take a snapshot of the database and use tablediff to find the differences.

 Traffic Manager fails over to the secondary region, but the primary SQL Server
database replica is still available. For example, the front-end tier might fail, without
affecting the SQL Server VMs. In that case, Internet traffic is routed to the secondary
region, and that region can still connect to the primary replica. However, there will be
increased latency, because the SQL Server connections are going across regions. In
this situation, you should perform a manual failover as follows:
1. Temporarily switch a SQL Server database replica in the secondary region to
synchronous commit. This ensures there won't be data loss during the failover.
2. Fail over to that replica.
3. When you fail back to the primary region, restore the asynchronous commit setting.

Manageability considerations
When you update your deployment, update one region at a time to reduce the chance of a
global failure from an incorrect configuration or an error in the application.

Test the resiliency of the system to failures. Here are some common failure scenarios to test:

 Shut down VM instances.


 Pressure resources such as CPU and memory.
 Disconnect/delay network.
 Crash processes.
 Expire certificates.
 Simulate hardware faults.
 Shut down the DNS service on the domain controllers.

Measure the recovery times and verify they meet your business requirements. Test
combinations of failure modes, as well.

N-tier application with Apache Cassandra


This reference architecture shows how to deploy VMs and a virtual network configured for

an N-tier application, using Apache Cassandra on Linux for the data tier.

Architecture
The architecture has the following components:

 Resource group. Resource groups are used to group resources so they can be
managed by lifetime, owner, or other criteria.
 Virtual network (VNet) and subnets. Every Azure VM is deployed into a VNet that
can be segmented into multiple subnets. Create a separate subnet for each tier.
 NSGs. Use network security groups (NSGs) to restrict network traffic within the
VNet. For example, in the 3-tier architecture shown here, the database tier does not
accept traffic from the web front end, only from the business tier and the management
subnet.
 Virtual machines. For recommendations on configuring VMs, see Run a Windows
VM on Azure and Run a Linux VM on Azure.
 Availability sets. Create an availability set for each tier, and provision at least two
VMs in each tier. This makes the VMs eligible for a higher service level agreement
(SLA) for VMs.
 VM scale set (not shown). A VM scale set is an alternative to using an availability
set. A scale sets makes it easy to scale out the VMs in a tier, either manually or
automatically based on predefined rules.
 Azure Load balancers. The load balancers distribute incoming Internet requests to
the VM instances. Use a public load balancer to distribute incoming Internet traffic to
the web tier, and an internal load balancer to distribute network traffic from the web
tier to the business tier.
 Public IP address. A public IP address is needed for the public load balancer to
receive Internet traffic.
 Jumpbox. Also called a bastion host. A secure VM on the network that
administrators use to connect to the other VMs. The jumpbox has an NSG that allows
remote traffic only from public IP addresses on a safe list. The NSG should permit ssh
traffic.
 Apache Cassandra database. Provides high availability at the data tier, by enabling
replication and failover.
 Azure DNS. Azure DNS is a hosting service for DNS domains, providing name
resolution using Microsoft Azure infrastructure. By hosting your domains in Azure,
you can manage your DNS records using the same credentials, APIs, tools, and billing
as your other Azure services.

Recommendations
Your requirements might differ from the architecture described here. Use these
recommendations as a starting point.

VNet / Subnets

When you create the VNet, determine how many IP addresses your resources in each subnet
require. Specify a subnet mask and a VNet address range large enough for the required IP
addresses, using CIDR notation. Use an address space that falls within the standard private IP
address blocks, which are 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16.
Choose an address range that does not overlap with your on-premises network, in case you
need to set up a gateway between the VNet and your on-premise network later. Once you
create the VNet, you can't change the address range.

Design subnets with functionality and security requirements in mind. All VMs within the
same tier or role should go into the same subnet, which can be a security boundary. For more
information about designing VNets and subnets, see Plan and design Azure Virtual Networks.

Load balancers

Do not expose the VMs directly to the Internet, but instead give each VM a private IP
address. Clients connect using the IP address of the public load balancer.

Define load balancer rules to direct network traffic to the VMs. For example, to enable HTTP
traffic, create a rule that maps port 80 from the front-end configuration to port 80 on the
back-end address pool. When a client sends an HTTP request to port 80, the load balancer
selects a back-end IP address by using a hashing algorithm that includes the source IP
address. In that way, client requests are distributed across all the VMs.

Network security groups

Use NSG rules to restrict traffic between tiers. For example, in the 3-tier architecture shown
above, the web tier does not communicate directly with the database tier. To enforce this, the
database tier should block incoming traffic from the web tier subnet.

1. Deny all inbound traffic from the VNet. (Use the VIRTUAL_NETWORK tag in the rule.)
2. Allow inbound traffic from the business tier subnet.
3. Allow inbound traffic from the database tier subnet itself. This rule allows communication
between the database VMs, which is needed for database replication and failover.
4. Allow ssh traffic (port 22) from the jumpbox subnet. This rule lets administrators connect to
the database tier from the jumpbox.

Create rules 2 – 4 with higher priority than the first rule, so they override it.

Cassandra

We recommend DataStax Enterprise for production use, but these recommendations apply to
any Cassandra edition. For more information on running DataStax in Azure, see DataStax
Enterprise Deployment Guide for Azure.

Put the VMs for a Cassandra cluster in an availability set to ensure that the Cassandra replicas
are distributed across multiple fault domains and upgrade domains. For more information
about fault domains and upgrade domains, see Manage the availability of virtual machines.

Configure three fault domains (the maximum) per availability set and 18 upgrade domains
per availability set. This provides the maximum number of upgrade domains that can still be
distributed evenly across the fault domains.

Configure nodes in rack-aware mode. Map fault domains to racks in the cassandra-
rackdc.properties file.
You don't need a load balancer in front of the cluster. The client connects directly to a node in
the cluster.

For high availability, deploy Cassandra in more than one Azure region. Within each region,
nodes are configured in rack-aware mode with fault and upgrade domains, for resiliency
inside the region.

Jumpbox

Do not allow ssh access from the public Internet to the VMs that run the application
workload. Instead, all ssh access to these VMs must come through the jumpbox. An
administrator logs into the jumpbox, and then logs into the other VM from the jumpbox. The
jumpbox allows ssh traffic from the Internet, but only from known, safe IP addresses.

The jumpbox has minimal performance requirements, so select a small VM size. Create a
public IP address for the jumpbox. Place the jumpbox in the same VNet as the other VMs,
but in a separate management subnet.

To secure the jumpbox, add an NSG rule that allows ssh connections only from a safe set of
public IP addresses. Configure the NSGs for the other subnets to allow ssh traffic from the
management subnet.

Scalability considerations
VM scale sets help you to deploy and manage a set of identical VMs. Scale sets support
autoscaling based on performance metrics. As the load on the VMs increases, additional VMs
are automatically added to the load balancer. Consider scale sets if you need to quickly scale
out VMs, or need to autoscale.

There are two basic ways to configure VMs deployed in a scale set:

 Use extensions to configure the VM after it is provisioned. With this approach, new
VM instances may take longer to start up than a VM with no extensions.
 Deploy a managed disk with a custom disk image. This option may be quicker to
deploy. However, it requires you to keep the image up to date.

For additional considerations, see Design considerations for scale sets.

Tip

When using any autoscale solution, test it with production-level workloads well in advance.

Each Azure subscription has default limits in place, including a maximum number of VMs
per region. You can increase the limit by filing a support request. For more information, see
Azure subscription and service limits, quotas, and constraints.

Availability considerations
If you are not using VM scale sets, put VMs in the same tier into an availability set. Create at
least two VMs in the availability set to support the availability SLA for Azure VMs. For
more information, see Manage the availability of virtual machines.

The load balancer uses health probes to monitor the availability of VM instances. If a probe
cannot reach an instance within a timeout period, the load balancer stops sending traffic to
that VM. However, the load balancer will continue to probe, and if the VM becomes
available again, the load balancer resumes sending traffic to that VM.

Here are some recommendations on load balancer health probes:

 Probes can test either HTTP or TCP. If your VMs run an HTTP server, create an HTTP probe.
Otherwise create a TCP probe.
 For an HTTP probe, specify the path to an HTTP endpoint. The probe checks for an HTTP 200
response from this path. This can be the root path ("/"), or a health-monitoring endpoint
that implements some custom logic to check the health of the application. The endpoint
must allow anonymous HTTP requests.
 The probe is sent from a known IP address, 168.63.129.16. Make sure you don't block traffic
to or from this IP address in any firewall policies or network security group (NSG) rules.
 Use health probe logs to view the status of the health probes. Enable logging in the Azure
portal for each load balancer. Logs are written to Azure Blob storage. The logs show how
many VMs on the back end are not receiving network traffic due to failed probe responses.

For the Cassandra cluster, the failover scenarios to consider depend on the consistency levels
used by the application, as well as the number of replicas used. For consistency levels and
usage in Cassandra, see Configuring data consistency and Cassandra: How many nodes are
talked to with Quorum? Data availability in Cassandra is determined by the consistency level
used by the application and the replication mechanism. For replication in Cassandra, see Data
Replication in NoSQL Databases Explained.

Security considerations
Virtual networks are a traffic isolation boundary in Azure. VMs in one VNet cannot
communicate directly with VMs in a different VNet. VMs within the same VNet can
communicate, unless you create network security groups (NSGs) to restrict traffic. For more
information, see Microsoft cloud services and network security.

For incoming Internet traffic, the load balancer rules define which traffic can reach the back
end. However, load balancer rules don't support IP safe lists, so if you want to add certain
public IP addresses to a safe list, add an NSG to the subnet.

Consider adding a network virtual appliance (NVA) to create a DMZ between the Internet
and the Azure virtual network. NVA is a generic term for a virtual appliance that can perform
network-related tasks, such as firewall, packet inspection, auditing, and custom routing. For
more information, see Implementing a DMZ between Azure and the Internet.

Encrypt sensitive data at rest and use Azure Key Vault to manage the database encryption
keys. Key Vault can store encryption keys in hardware security modules (HSMs). It's also
recommended to store application secrets, such as database connection strings, in Key Vault.
We recommend enabling DDoS Protection Standard, which provides additional DDoS
mitigation for resources in a VNet. Although basic DDoS protection is automatically enabled
as part of the Azure platform, DDoS Protection Standard provides mitigation capabilities that
are tuned specifically to Azure Virtual Network resources.

DMZ between Azure and your on-premises


datacenter
This reference architecture shows a secure hybrid network that extends an on-premises
network to Azure. The architecture implements a DMZ, also called a perimeter network,
between the on-premises network and an Azure virtual network (VNet). The DMZ includes
network virtual appliances (NVAs) that implement security functionality such as firewalls
and packet inspection. All outgoing traffic from the VNet is force-tunneled to the Internet
through the on-premises network, so that it can be audited. Deploy this solution.

This architecture requires a connection to your on-premises datacenter, using either a VPN
gateway or an ExpressRoute connection. Typical uses for this architecture include:

 Hybrid applications where workloads run partly on-premises and partly in Azure.
 Infrastructure that requires granular control over traffic entering an Azure VNet from an on-
premises datacenter.
 Applications that must audit outgoing traffic. This is often a regulatory requirement of many
commercial systems and can help to prevent public disclosure of private information.

Architecture
The architecture consists of the following components.

 On-premises network. A private local-area network implemented in an organization.


 Azure virtual network (VNet). The VNet hosts the application and other resources
running in Azure.
 Gateway. The gateway provides connectivity between the routers in the on-premises
network and the VNet.
 Network virtual appliance (NVA). NVA is a generic term that describes a VM
performing tasks such as allowing or denying access as a firewall, optimizing wide
area network (WAN) operations (including network compression), custom routing, or
other network functionality.
 Web tier, business tier, and data tier subnets. Subnets hosting the VMs and
services that implement an example 3-tier application running in the cloud. See
Running Windows VMs for an N-tier architecture on Azure for more information.
 User defined routes (UDR). User defined routes define the flow of IP traffic within
Azure VNets.

Note

Depending on the requirements of your VPN connection, you can configure Border
Gateway Protocol (BGP) routes instead of using UDRs to implement the forwarding
rules that direct traffic back through the on-premises network.

 Management subnet. This subnet contains VMs that implement management and
monitoring capabilities for the components running in the VNet.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.

Access control recommendations

Use Role-Based Access Control (RBAC) to manage the resources in your application.
Consider creating the following custom roles:

 A DevOps role with permissions to administer the infrastructure for the application,
deploy the application components, and monitor and restart VMs.
 A centralized IT administrator role to manage and monitor network resources.
 A security IT administrator role to manage secure network resources such as the
NVAs.

The DevOps and IT administrator roles should not have access to the NVA resources. This
should be restricted to the security IT administrator role.

Resource group recommendations

Azure resources such as VMs, VNets, and load balancers can be easily managed by grouping
them together into resource groups. Assign RBAC roles to each resource group to restrict
access.

We recommend creating the following resource groups:

 A resource group containing the VNet (excluding the VMs), NSGs, and the gateway resources
for connecting to the on-premises network. Assign the centralized IT administrator role to
this resource group.
 A resource group containing the VMs for the NVAs (including the load balancer), the
jumpbox and other management VMs, and the UDR for the gateway subnet that forces all
traffic through the NVAs. Assign the security IT administrator role to this resource group.
 Separate resource groups for each application tier that contain the load balancer and VMs.
Note that this resource group shouldn't include the subnets for each tier. Assign the DevOps
role to this resource group.

Virtual network gateway recommendations

On-premises traffic passes to the VNet through a virtual network gateway. We recommend an
Azure VPN gateway or an Azure ExpressRoute gateway.

NVA recommendations

NVAs provide different services for managing and monitoring network traffic. The Azure
Marketplace offers several third-party vendor NVAs that you can use. If none of these third-
party NVAs meet your requirements, you can create a custom NVA using VMs.

For example, the solution deployment for this reference architecture implements an NVA
with the following functionality on a VM:

 Traffic is routed using IP forwarding on the NVA network interfaces (NICs).


 Traffic is permitted to pass through the NVA only if it is appropriate to do so. Each NVA VM
in the reference architecture is a simple Linux router. Inbound traffic arrives on network
interface eth0, and outbound traffic matches rules defined by custom scripts dispatched
through network interface eth1.
 The NVAs can only be configured from the management subnet.
 Traffic routed to the management subnet does not pass through the NVAs. Otherwise, if the
NVAs fail, there would be no route to the management subnet to fix them.
 The VMs for the NVA are placed in an availability set behind a load balancer. The UDR in the
gateway subnet directs NVA requests to the load balancer.

Include a layer-7 NVA to terminate application connections at the NVA level and maintain
affinity with the backend tiers. This guarantees symmetric connectivity, in which response
traffic from the backend tiers returns through the NVA.

Another option to consider is connecting multiple NVAs in series, with each NVA
performing a specialized security task. This allows each security function to be managed on a
per-NVA basis. For example, an NVA implementing a firewall could be placed in series with
an NVA running identity services. The tradeoff for ease of management is the addition of
extra network hops that may increase latency, so ensure that this doesn't affect your
application's performance.

NSG recommendations

The VPN gateway exposes a public IP address for the connection to the on-premises network.
We recommend creating a network security group (NSG) for the inbound NVA subnet, with
rules to block all traffic not originating from the on-premises network.
We also recommend NSGs for each subnet to provide a second level of protection against
inbound traffic bypassing an incorrectly configured or disabled NVA. For example, the web
tier subnet in the reference architecture implements an NSG with a rule to ignore all requests
other than those received from the on-premises network (192.168.0.0/16) or the VNet, and
another rule that ignores all requests not made on port 80.

Internet access recommendations

Force-tunnel all outbound Internet traffic through your on-premises network using the site-to-
site VPN tunnel, and route to the Internet using network address translation (NAT). This
prevents accidental leakage of any confidential information stored in your data tier and
allows inspection and auditing of all outgoing traffic.

Note

Don't completely block Internet traffic from the application tiers, as this will prevent these
tiers from using Azure PaaS services that rely on public IP addresses, such as VM diagnostics
logging, downloading of VM extensions, and other functionality. Azure diagnostics also
requires that components can read and write to an Azure Storage account.

Verify that outbound internet traffic is force-tunneled correctly. If you're using a VPN
connection with the routing and remote access service on an on-premises server, use a tool
such as WireShark or Microsoft Message Analyzer.

Management subnet recommendations

The management subnet contains a jumpbox that performs management and monitoring
functionality. Restrict execution of all secure management tasks to the jumpbox.

Do not create a public IP address for the jumpbox. Instead, create one route to access the
jumpbox through the incoming gateway. Create NSG rules so the management subnet only
responds to requests from the allowed route.

Scalability considerations
The reference architecture uses a load balancer to direct on-premises network traffic to a pool
of NVA devices, which route the traffic. The NVAs are placed in an availability set. This
design allows you to monitor the throughput of the NVAs over time and add NVA devices in
response to increases in load.

The standard SKU VPN gateway supports sustained throughput of up to 100 Mbps. The High
Performance SKU provides up to 200 Mbps. For higher bandwidths, consider upgrading to an
ExpressRoute gateway. ExpressRoute provides up to 10 Gbps bandwidth with lower latency
than a VPN connection.

For more information about the scalability of Azure gateways, see the scalability
consideration section in Implementing a hybrid network architecture with Azure and on-
premises VPN and Implementing a hybrid network architecture with Azure ExpressRoute.
Availability considerations
As mentioned, the reference architecture uses a pool of NVA devices behind a load balancer.
The load balancer uses a health probe to monitor each NVA and will remove any
unresponsive NVAs from the pool.

If you're using Azure ExpressRoute to provide connectivity between the VNet and on-
premises network, configure a VPN gateway to provide failover if the ExpressRoute
connection becomes unavailable.

For specific information on maintaining availability for VPN and ExpressRoute connections,
see the availability considerations in Implementing a hybrid network architecture with Azure
and on-premises VPN and Implementing a hybrid network architecture with Azure
ExpressRoute.

Manageability considerations
All application and resource monitoring should be performed by the jumpbox in the
management subnet. Depending on your application requirements, you may need additional
monitoring resources in the management subnet. If so, these resources should be accessed
through the jumpbox.

If gateway connectivity from your on-premises network to Azure is down, you can still reach
the jumpbox by deploying a public IP address, adding it to the jumpbox, and remoting in
from the internet.

Each tier's subnet in the reference architecture is protected by NSG rules. You may need to
create a rule to open port 3389 for remote desktop protocol (RDP) access on Windows VMs
or port 22 for secure shell (SSH) access on Linux VMs. Other management and monitoring
tools may require rules to open additional ports.

If you're using ExpressRoute to provide the connectivity between your on-premises


datacenter and Azure, use the Azure Connectivity Toolkit (AzureCT) to monitor and
troubleshoot connection issues.

You can find additional information specifically aimed at monitoring and managing VPN and
ExpressRoute connections in the articles Implementing a hybrid network architecture with
Azure and on-premises VPN and Implementing a hybrid network architecture with Azure
ExpressRoute.

Security considerations
This reference architecture implements multiple levels of security.

Routing all on-premises user requests through the NVA

The UDR in the gateway subnet blocks all user requests other than those received from on-
premises. The UDR passes allowed requests to the NVAs in the private DMZ subnet, and
these requests are passed on to the application if they are allowed by the NVA rules. You can
add other routes to the UDR, but make sure they don't inadvertently bypass the NVAs or
block administrative traffic intended for the management subnet.

The load balancer in front of the NVAs also acts as a security device by ignoring traffic on
ports that are not open in the load balancing rules. The load balancers in the reference
architecture only listen for HTTP requests on port 80 and HTTPS requests on port 443.
Document any additional rules that you add to the load balancers, and monitor traffic to
ensure there are no security issues.

Using NSGs to block/pass traffic between application tiers

Traffic between tiers is restricted by using NSGs. The business tier blocks all traffic that
doesn't originate in the web tier, and the data tier blocks all traffic that doesn't originate in the
business tier. If you have a requirement to expand the NSG rules to allow broader access to
these tiers, weigh these requirements against the security risks. Each new inbound pathway
represents an opportunity for accidental or purposeful data leakage or application damage.

DevOps access

Use RBAC to restrict the operations that DevOps can perform on each tier. When granting
permissions, use the principle of least privilege. Log all administrative operations and
perform regular audits to ensure any configuration changes were planned.

Deploy the solution


A deployment for a reference architecture that implements these recommendations is
available on GitHub.

Prerequisites

1. Clone, fork, or download the zip file for the reference architectures GitHub
repository.
2. Install Azure CLI 2.0.
3. Install the Azure building blocks npm package.

bash

 npm install -g @mspnp/azure-building-blocks

 From a command prompt, bash prompt, or PowerShell prompt, sign into your Azure
account as follows:

bash
4. az login
5.
Deploy resources

1. Navigate to the /dmz/secure-vnet-hybrid folder of the reference architectures


GitHub repository.
2. Run the following command:

bash

 azbb -s <subscription_id> -g <resource_group_name> -l <region> -p


onprem.json --deploy

 Run the following command:

bash
3. azbb -s <subscription_id> -g <resource_group_name> -l <region> -p
secure-vnet-hybrid.json --deploy
4.

Connect the on-premises and Azure gateways

In this step, you will connect the two local network gateways.

1. In the Azure Portal, navigate to the resource group that you created.
2. Find the resource named ra-vpn-vgw-pip and copy the IP address shown in the
Overview blade.
3. Find the resource named onprem-vpn-lgw.
4. Click the Configuration blade. Under IP address, paste in the IP address from step
2.
5. Click Save and wait for the operation to complete. It can take about 5 minutes.
6. Find the resource named onprem-vpn-gateway1-pip. Copy the IP address shown in
the Overview blade.
7. Find the resource named ra-vpn-lgw.
8. Click the Configuration blade. Under IP address, paste in the IP address from step
6.
9. Click Save and wait for the operation to complete.
10. To verify the connection, go to the Connections blade for each gateway. The status
should be Connected.

Verify that network traffic reaches the web tier

1. In the Azure Portal, navigate to the resource group that you created.
2. Find the resource named int-dmz-lb, which is the load balancer in front of the
private DMZ. Copy the private IP address from the Overview blade.
3. Find the VM named jb-vm1. Click Connect and use Remote Desktop to connect to
the VM. The user name and password are specified in the onprem.json file.
4. From the Remote Desktop Session, open a web browser and navigate to the IP
address from step 2. You should see the default Apache2 server home page.

DMZ between Azure and the Internet


This reference architecture shows a secure hybrid network that extends an on-premises
network to Azure and also accepts Internet traffic.

This reference architecture extends the architecture described in Implementing a DMZ


between Azure and your on-premises datacenter. It adds a public DMZ that handles Internet
traffic, in addition to the private DMZ that handles traffic from the on-premises network

Typical uses for this architecture include:

 Hybrid applications where workloads run partly on-premises and partly in Azure.
 Azure infrastructure that routes incoming traffic from on-premises and the Internet.

Architecture
The architecture consists of the following components.

 Public IP address (PIP). The IP address of the public endpoint. External users connected to
the Internet can access the system through this address.
 Network virtual appliance (NVA). This architecture includes a separate pool of NVAs for
traffic originating on the Internet.
 Azure load balancer. All incoming requests from the Internet pass through the load balancer
and are distributed to the NVAs in the public DMZ.
 Public DMZ inbound subnet. This subnet accepts requests from the Azure load balancer.
Incoming requests are passed to one of the NVAs in the public DMZ.
 Public DMZ outbound subnet. Requests that are approved by the NVA pass through this
subnet to the internal load balancer for the web tier.

Recommendations
The following recommendations apply for most scenarios. Follow these recommendations
unless you have a specific requirement that overrides them.
NVA recommendations

Use one set of NVAs for traffic originating on the Internet, and another for traffic originating
on-premises. Using only one set of NVAs for both is a security risk, because it provides no
security perimeter between the two sets of network traffic. Using separate NVAs reduces the
complexity of checking security rules, and makes it clear which rules correspond to each
incoming network request. One set of NVAs implements rules for Internet traffic only, while
another set of NVAs implement rules for on-premises traffic only.

Include a layer-7 NVA to terminate application connections at the NVA level and maintain
compatibility with the backend tiers. This guarantees symmetric connectivity where response
traffic from the backend tiers returns through the NVA.

Public load balancer recommendations

For scalability and availability, deploy the public DMZ NVAs in an availability set and use
an Internet facing load balancer to distribute Internet requests across the NVAs in the
availability set.

Configure the load balancer to accept requests only on the ports necessary for Internet traffic.
For example, restrict inbound HTTP requests to port 80 and inbound HTTPS requests to port
443.

Scalability considerations
Even if your architecture initially requires a single NVA in the public DMZ, we recommend
putting a load balancer in front of the public DMZ from the beginning. That will make it
easier to scale to multiple NVAs in the future, if needed.

Availability considerations
The Internet facing load balancer requires each NVA in the public DMZ inbound subnet to
implement a health probe. A health probe that fails to respond on this endpoint is considered
to be unavailable, and the load balancer will direct requests to other NVAs in the same
availability set. Note that if all NVAs fail to respond, your application will fail, so it's
important to have monitoring configured to alert DevOps when the number of healthy NVA
instances falls below a defined threshold.

Manageability considerations
All monitoring and management for the NVAs in the public DMZ should be performed by
the jumpbox in the management subnet. As discussed in Implementing a DMZ between
Azure and your on-premises datacenter, define a single network route from the on-premises
network through the gateway to the jumpbox, in order to restrict access.

If gateway connectivity from your on-premises network to Azure is down, you can still reach
the jumpbox by deploying a public IP address, adding it to the jumpbox, and logging in from
the Internet.
Security considerations
This reference architecture implements multiple levels of security:

 The Internet facing load balancer directs requests to the NVAs in the inbound public DMZ
subnet, and only on the ports necessary for the application.
 The NSG rules for the inbound and outbound public DMZ subnets prevent the NVAs from
being compromised, by blocking requests that fall outside of the NSG rules.
 The NAT routing configuration for the NVAs directs incoming requests on port 80 and port
443 to the web tier load balancer, but ignores requests on all other ports.

You should log all incoming requests on all ports. Regularly audit the logs, paying attention
to requests that fall outside of expected parameters, as these may indicate intrusion attempts.

Deploy the solution


A deployment for a reference architecture that implements these recommendations is
available on GitHub.

Prerequisites

1. Clone, fork, or download the zip file for the reference architectures GitHub
repository.
2. Install Azure CLI 2.0.
3. Install the Azure building blocks npm package.

bash

 npm install -g @mspnp/azure-building-blocks

 From a command prompt, bash prompt, or PowerShell prompt, sign into your Azure
account as follows:

bash
4. az login
5.

Deploy resources

1. Navigate to the /dmz/secure-vnet-hybrid folder of the reference architectures


GitHub repository.
2. Run the following command:

bash

 azbb -s <subscription_id> -g <resource_group_name> -l <region> -p


onprem.json --deploy

 Run the following command:


bash
3. azbb -s <subscription_id> -g <resource_group_name> -l <region> -p
secure-vnet-hybrid.json --deploy
4.

Connect the on-premises and Azure gateways

In this step, you will connect the two local network gateways.

1. In the Azure Portal, navigate to the resource group that you created.
2. Find the resource named ra-vpn-vgw-pip and copy the IP address shown in the
Overview blade.
3. Find the resource named onprem-vpn-lgw.
4. Click the Configuration blade. Under IP address, paste in the IP address from step
2.

5. Click Save and wait for the operation to complete. It can take about 5 minutes.
6. Find the resource named onprem-vpn-gateway1-pip. Copy the IP address shown in
the Overview blade.
7. Find the resource named ra-vpn-lgw.
8. Click the Configuration blade. Under IP address, paste in the IP address from step
6.
9. Click Save and wait for the operation to complete.
10. To verify the connection, go to the Connections blade for each gateway. The status
should be Connected.
Verify that network traffic reaches the web tier

1. In the Azure Portal, navigate to the resource group that you created.
2. Find the resource named pub-dmz-lb, which is the load balancer in front of the public
DMZ.
3. Copy the public IP addess from the Overview blade and open this address in a web
browser. You should see the default Apache2 server home page.
4. Find the resource named int-dmz-lb, which is the load balancer in front of the
private DMZ. Copy the private IP address from the Overview blade.
5. Find the VM named jb-vm1. Click Connect and use Remote Desktop to connect to
the VM. The user name and password are specified in the onprem.json file.
6. From the Remote Desktop Session, open a web browser and navigate to the IP
address from step 4. You should see the default Apache2 server home page.

Deploy highly available network virtual


appliances
 12/06/2016
 6 minutes to read
 Contributors

This article shows how to deploy a set of network virtual appliances (NVAs) for high
availability in Azure. An NVA is typically used to control the flow of network traffic from a
perimeter network, also known as a DMZ, to other networks or subnets. To learn about
implementing a DMZ in Azure, see Microsoft cloud services and network security. The
article includes example architectures for ingress only, egress only, and both ingress and
egress.

Prerequisites: This article assumes a basic understanding of Azure networking, Azure load
balancers, and user-defined routes (UDRs).

Architecture Diagrams
An NVA can be deployed to a DMZ in many different architectures. For example, the
following figure illustrates the use of a single NVA for ingress.
In this architecture, the NVA provides a secure network boundary by checking all inbound
and outbound network traffic and passing only the traffic that meets network security rules.
However, the fact that all network traffic must pass through the NVA means that the NVA is
a single point of failure in the network. If the NVA fails, there is no other path for network
traffic and all the back-end subnets are unavailable.

To make an NVA highly available, deploy more than one NVA into an availability set.

The following architectures describe the resources and configuration necessary for highly
available NVAs:

Solution Benefits Considerations


Requires an NVA that can terminate connections
and use SNAT
Ingress with layer All NVA nodes are Requires a separate set of NVAs for traffic coming
7 NVAs active from the Internet and from Azure
Can only be used for traffic originating outside
Azure
Requires an NVA that can terminate connections
Egress with layer All NVA nodes are
and implements source network address
7 NVAs active
translation (SNAT)
Requires an NVA that can terminate connections
Ingress-Egress All nodes are active
and use SNAT
with layer 7 Able to handle traffic
Requires a separate set of NVAs for traffic coming
NVAs originated in Azure
from the Internet and from Azure
Single set of NVAs for
all traffic Active-passive
PIP-UDR switch
Can handle all traffic Requires a failover process
(no limit on port rules)
Ingress with layer 7 NVAs
The following figure shows a high availability architecture that implements an ingress DMZ
behind an internet-facing load balancer. This architecture is designed to provide connectivity
to Azure workloads for layer 7 traffic, such as HTTP or HTTPS:

The benefit of this architecture is that all NVAs are active, and if one fails the load balancer
directs network traffic to the other NVA. Both NVAs route traffic to the internal load
balancer so as long as one NVA is active, traffic continues to flow. The NVAs are required to
terminate SSL traffic intended for the web tier VMs. These NVAs cannot be extended to
handle on-premises traffic because on-premises traffic requires another dedicated set of
NVAs with their own network routes.

Note

This architecture is used in the DMZ between Azure and your on-premises datacenter
reference architecture and the DMZ between Azure and the Internet reference architecture.
Each of these reference architectures includes a deployment solution that you can use. Follow
the links for more information.

Egress with layer 7 NVAs


The previous architecture can be expanded to provide an egress DMZ for requests originating
in the Azure workload. The following architecture is designed to provide high availability of
the NVAs in the DMZ for layer 7 traffic, such as HTTP or HTTPS:
In this architecture, all traffic originating in Azure is routed to an internal load balancer. The
load balancer distributes outgoing requests between a set of NVAs. These NVAs direct traffic
to the Internet using their individual public IP addresses.

Note

This architecture is used in the DMZ between Azure and your on-premises datacenter
reference architecture and the DMZ between Azure and the Internet reference architecture.
Each of these reference architectures includes a deployment solution that you can use. Follow
the links for more information.

Ingress-egress with layer 7 NVAs


In the two previous architectures, there was a separate DMZ for ingress and egress. The
following architecture demonstrates how to create a DMZ that can be used for both ingress
and egress for layer 7 traffic, such as HTTP or HTTPS:

In this architecture, the NVAs process incoming requests from the application gateway. The
NVAs also process outgoing requests from the workload VMs in the back-end pool of the
load balancer. Because incoming traffic is routed with an application gateway and outgoing
traffic is routed with a load balancer, the NVAs are responsible for maintaining session
affinity. That is, the application gateway maintains a mapping of inbound and outbound
requests so it can forward the correct response to the original requestor. However, the internal
load balancer does not have access to the application gateway mappings, and uses its own
logic to send responses to the NVAs. It's possible the load balancer could send a response to
an NVA that did not initially receive the request from the application gateway. In this case,
the NVAs must communicate and transfer the response between them so the correct NVA can
forward the response to the application gateway.

Note

You can also solve the asymmetric routing issue by ensuring the NVAs perform inbound
source network address translation (SNAT). This would replace the original source IP of the
requestor to one of the IP addresses of the NVA used on the inbound flow. This ensures that
you can use multiple NVAs at a time, while preserving the route symmetry.

PIP-UDR switch with layer 4 NVAs


The following architecture demonstrates an architecture with one active and one passive
NVA. This architecture handles both ingress and egress for layer 4 traffic:

This architecture is similar to the first architecture discussed in this article. That architecture
included a single NVA accepting and filtering incoming layer 4 requests. This architecture
adds a second passive NVA to provide high availability. If the active NVA fails, the passive
NVA is made active and the UDR and PIP are changed to point to the NICs on the now
active NVA. These changes to the UDR and PIP can either be done manually or using an
automated process. The automated process is typically daemon or other monitoring service
running in Azure. It queries a health probe on the active NVA and performs the UDR and PIP
switch when it detects a failure of the NVA.

The preceding figure shows an example ZooKeeper cluster providing a high availability
daemon. Within the ZooKeeper cluster, a quorum of nodes elects a leader. If the leader fails,
the remaining nodes hold an election to elect a new leader. For this architecture, the leader
node executes the daemon that queries the health endpoint on the NVA. If the NVA fails to
respond to the health probe, the daemon activates the passive NVA. The daemon then calls
the Azure REST API to remove the PIP from the failed NVA and attaches it to newly
activated NVA. The daemon then modifies the UDR to point to the newly activated NVA's
internal IP address.
Note

Do not include the ZooKeeper nodes in a subnet that is only accessible using a route that
includes the NVA. Otherwise, the ZooKeeper nodes are inaccessible if the NVA fails. Should
the daemon fail for any reason, you won't be able to access any of the ZooKeeper nodes to
diagnose the problem.

You might also like