Monitor and Troubleshoot Your End-To-End Azure Network Infrastructure by Using Network Monitoring Tools
Monitor and Troubleshoot Your End-To-End Azure Network Infrastructure by Using Network Monitoring Tools
Azure Network Watcher includes several tools that you can use to monitor your virtual
networks and virtual machines (VMs). To effectively make use of Network Watcher, it's
essential to understand all the available options and the purpose of each tool.
In your engineering company, you want to enable your staff to choose the right
Network Watcher tool for each troubleshooting task. They need to understand all the
options available and the kinds of problems that each tool can solve.
Here, you'll look at the Network Watcher tool categories, the tools in each category, and
how each tool is applied in example use cases.
Monitoring tools
Diagnostic tools
With tools to monitor for and diagnose problems, Network Watcher gives you a
centralized hub for identifying network glitches, CPU spikes, connectivity problems,
memory leaks, and other issues before they affect your business.
Topology
Connection Monitor
Network Performance Monitor
Let's look at each of these tools.
The topology tool generates a graphical display of your Azure virtual network, its
resources, its interconnections, and their relationships with each other.
Suppose you have to troubleshoot a virtual network created by your colleagues. Unless
you were involved in the creation process of the network, you might not know about all
the aspects of the infrastructure. You can use the topology tool to visualize and
understand the infrastructure you're dealing with before you start troubleshooting.
You use the Azure portal to view the topology of an Azure network. In the Azure portal:
Note
This tool also measures the latency between resources. It can catch changes that will
affect connectivity, such as changes to the network configuration or changes to network
security group (NSG) rules. It can probe VMs at regular intervals to look for failures or
changes.
If there's an issue, Connection Monitor tells you why it occurred and how to fix it. Along
with monitoring VMs, Connection Monitor can examine an IP address or fully qualified
domain name (FQDN).
The Network Performance Monitor tool enables you to track and alert on latency and
packet drops over time. It gives you a centralized view of your network.
When you decide to monitor your hybrid connections by using Network Performance
Monitor, check that the associated workspace is in a supported region.
IP flow verify
Next hop
Effective security rules
Packet capture
Connection troubleshoot
VPN troubleshoot
Let's examine each tool and find out how they can help you solve problems.
The IP flow verify tool tells you if packets are allowed or denied for a specific virtual
machine. If a network security group denies a packet, the tool tells you the name of that
group so that you can fix the problem.
When a VM sends a packet to a destination, it might take multiple hops in its journey.
For example, if the destination is a VM in a different virtual network, the next hop might
be the virtual network gateway that routes the packet to the destination VM.
With the next hop tool, you can determine how a packet gets from a VM to any
destination. You specify the source VM, source network adapter, source IP address, and
destination IP address. The tool then determines the packet's destination. You can use
this tool to diagnose problems caused by incorrect routing tables.
The effective security rules tool in Network Watcher displays all the effective NSG rules
applied to a network interface.
Network security groups (NSGs) are used in Azure networks to filter packets based on
their source and destination IP address and port numbers. NSGs are vital to security
because they help you carefully control the surface area of the VMs that users can
access. Keep in mind, though, that a mistakenly configured NSG rule might prevent
legitimate communication. As a result, NSGs are a frequent source of network problems.
For example, if two VMs can't communicate because an NSG rule blocks them, it can be
difficult to diagnose which rule is causing the problem. You'll use the effective security
rules tool in Network Watcher to display all the effective NSG rules and help you
diagnose which rule is causing the specific problem.
To use the tool, you choose a VM and its network adapter. The tool displays all the NSG
rules that apply to that adapter. It's easy to determine a blocking rule by viewing this
list.
You can also use the tool to spot vulnerabilities for your VM caused by unnecessary
open ports.
You use the packet capture tool to record all of the packets sent to and from a VM.
You'll then review the capture to gather statistics about network traffic or diagnose
anomalies, such as unexpected network traffic on a private virtual network.
The packet capture tool is a virtual machine extension that is remotely started through
Network Watcher and happens automatically when you start a packet capture session.
Keep in mind that there is a limit to the amount of packet capture sessions allowed per
region. The default usage limit is 100 packet capture sessions per region, and the overall
limit is 10,000. These limits are for the number of sessions only, not saved captures. You
can save packets captured in Azure Storage or locally on your computer.
You use the connection troubleshoot tool to check TCP connectivity between a source
and destination VM. You can specify the destination VM by using an FQDN, a URI, or an
IP address.
If the connection is unsuccessful, you'll see details of the fault. Fault types include:
You can use the VPN troubleshoot tool to diagnose problems with virtual network
gateway connections. This tool runs diagnostics on a virtual network gateway
connection and returns a health diagnosis.
When you start the VPN troubleshoot tool, Network Watcher diagnoses the health of
the gateway or connection, and returns the appropriate results. The request is a long-
running transaction.
Your colleagues have deployed a VM in Azure and are having network connectivity
issues. Your colleagues are trying to use Remote Desktop Protocol (RDP) to connect to
the virtual machine, but they can't connect.
To troubleshoot this issue, use the IP flow verify tool. This tool lets you specify a local
and remote port, the protocol (TCP/UDP), the local IP, and the remote IP to check the
connection status. It also lets you specify the direction of the connection (inbound or
outbound). IP flow verify runs a logical test on the rules in place on your network.
In this case, use IP flow verify to specify the VM's IP address and the RDP port 3389.
Then, specify the remote VM's IP address and port. Choose the TCP protocol, and then
select Check.
Suppose the result shows that access was denied because of the NSG
rule DefaultInboundDenyAll. The solution is to change the NSG rule.
Your colleagues have deployed VMs in two virtual networks and can't connect between
them.
To troubleshoot a VPN connection, use Azure VPN troubleshoot. This tool runs
diagnostics on a virtual network gateway connection, and returns a health diagnosis.
You can run this tool from the Azure portal, PowerShell, or the Azure CLI.
When you run the tool, it checks the gateway for common issues and returns the health
diagnosis. You can also view the log file to get more information. The diagnosis will
show whether the VPN connection is working. If the VPN connection isn't working, VPN
troubleshoot will suggest ways to resolve the issue.
Suppose the diagnosis shows a key mismatch. To resolve the problem, reconfigure the
remote gateway to make sure the keys match on both ends. Pre-shared keys are case-
sensitive.
Your colleagues have deployed VMs in a single virtual network and can't connect
between them.
Use the connection troubleshoot tool to troubleshoot this issue. In this tool, you specify
the local and remote VMs. In the probe setting, you can choose a specific port.
Suppose the results show the remote server is Unreachable, along with the message
"Traffic blocked due to virtual machine firewall configuration." On the remote server,
disable the firewall, and then test the connection again.
Suppose the server is now reachable. This result indicates that firewall rules on the
remote server are the issue, and must be corrected to permit the connection.
1.
To resolve latency issues on the network, which Azure Network Watcher features can
you use?
IP flow verify
Next hop
Connection troubleshoot
Azure Network Watcher helps you diagnose configuration errors that prevent virtual
machines (VMs) from communicating.
Suppose you have two VMs that can't communicate. You want to diagnose the problem
and resolve it as fast as possible. You want to use Network Watcher to do that.
Important
You need your own Azure subscription to run this exercise, and you might incur
charges. If you don't already have an Azure subscription, create a free account before
you begin.
1. In your browser, open the Azure Cloud Shell, and log in to the directory with
access to the subscription you want to create resources in.
2. To create a variable to store your resource group name, and a resource
group for your resources, in the Bash Cloud Shell, run the following
command. Replace <resource group name> with a name for your resource
group, and <location> with the Azure region you'd like to deploy your
resources in.
Azure CLICopy
RG=<resource group name>
Azure CLICopy
az network vnet create \
--resource-group $RG \
--name MyVNet1 \
--address-prefix 10.10.0.0/16 \
--subnet-name FrontendSubnet \
--subnet-prefix 10.10.1.0/24
Azure CLICopy
az network vnet subnet create \
--address-prefixes 10.10.2.0/24 \
--name BackendSubnet \
--resource-group $RG \
--vnet-name MyVNet1
Note
Azure CLICopy
az vm create \
--resource-group $RG \
--name FrontendVM \
--vnet-name MyVNet1 \
--subnet FrontendSubnet \
--image Win2019Datacenter \
--admin-username azureuser \
--admin-password <password>
Azure CLICopy
az vm extension set \
--publisher Microsoft.Compute \
--name CustomScriptExtension \
--vm-name FrontendVM \
--resource-group $RG \
--settings '{"commandToExecute":"powershell.exe Install-
WindowsFeature -Name Web-Server"}' \
--no-wait
Azure CLICopy
az vm create \
--resource-group $RG \
--name BackendVM \
--vnet-name MyVNet1 \
--subnet BackendSubnet \
--image Win2019Datacenter \
--admin-username azureuser \
--admin-password <password>
Azure CLICopy
az network nsg create \
--name MyNsg \
--resource-group $RG
Azure CLICopy
az network nsg rule create \
--resource-group $RG \
--name MyNSGRule \
--nsg-name MyNsg \
--priority 4096 \
--source-address-prefixes '*' \
--source-port-ranges '*' \
--destination-address-prefixes '*' \
--destination-port-ranges 80 443 3389 \
--access Deny \
--protocol TCP \
--direction Inbound \
--description "Deny from specific IP address ranges on 80, 443 and
3389."
11. To associate a network security group with a subnet, run this command.
Azure CLICopy
az network vnet subnet update \
--resource-group $RG \
--name BackendSubnet \
--vnet-name MyVNet1 \
--network-security-group MyNsg
Enable Network Watcher for your region
Now, to set up Network Watcher in the same region as the infrastructure, let's use the
Azure CLI.
Azure CLICopy
az network watcher configure \
--locations "" (*Match the creation of the resource group*) \
--enabled true \
--resource-group $RG
Setting Value
Name Back-to-front-RDP-test
Subscription Select your subscription
Virtual machine BackendVM
Destination virtual machine FrontendVM
Port 3389
Setting Value
Probing interval (seconds) 30
3.
4. Select + Add. Configure a second test with these values, and then
select Add.
Setting Value
Name Back-to-front-HTTP-test
Subscription Select your subscription
Virtual machine BackendVM
Destination virtual machine FrontendVM
Port 80
Probing interval (seconds) 30
The results should show that, because the NSG is associated to the back-end subnet,
traffic flows without issues from the back-end VM to the front-end VM.
Setting Value
Name front-to-back-RDP-test
Subscription Select your subscription
Virtual machine FrontendVM
Destination virtual machine BackendVM
Port 3389
Probing interval (seconds) 30
3. Select + Add. Configure a second test with these values, and then
select Add.
Setting Value
Name Front-to-back-HTTP-test
Subscription Select your subscription
Virtual machine FrontendVM
Destination virtual machine BackendVM
Port 80
Probing interval (seconds) 30
The results should show that, because the NSG is associated with the back-end subnet,
no traffic flows from the front-end VM to the back-end VM.
Setting Value
Subscription Select your subscription
Resource group Select your resource group
Virtual machine BackendVM
Network interface BackendVMVMNic
Protocol TCP
Direction Outbound
Local IP address 10.10.2.4
Setting Value
Local port 3389
Remote IP 10.10.1.4
Remote port 3389
3.
4. Examine the results. They show that access is denied because of NSG and
security rules.
In this exercise, you have successfully used Network Watcher tools to discover the
connectivity issue between the two subnets. Communication is allowed one way but
blocked the other way because of NSG rules.
Troubleshoot a network by using
Network Watcher metrics and logs
9 minutes
If you want to diagnose a problem quickly, you have to understand the information
that's available in the Azure Network Watcher logs.
In your engineering company, you want to minimize the time it takes for your staff to
diagnose and resolve any network configuration problem. You want to ensure they
know which information is available in which logs.
In this module, you'll focus on flow logs, diagnostic logs, and traffic analytics. You'll learn
how these tools can help to troubleshoot the Azure network.
Network interfaces
Network security groups (NSGs)
Virtual networks
Public IP addresses
Flow logs
Diagnostic logs
Traffic analytics
Flow logs
In flow logs, you can view information about ingress and egress IP traffic on network
security groups. Flow logs show outbound and inbound flows on a per-rule basis, based
on the network adapter that the flow applies. NSG flow logs show whether traffic was
allowed or denied based on the 5-tuple information captured. This information includes:
Source IP
Source port
Destination IP
Destination port
Protocol
This diagram shows the workflow that the NSG follows.
Flow logs store data in a JSON file. It can be difficult to gain insights into this data by
manually searching the log files, especially if you have a large infrastructure deployment
in Azure. You can solve this problem by using Power BI.
In Power BI, you can visualize NSG flow logs by, for example:
You can also use open source tools to analyze your logs, such as Elastic Stack, Grafana,
and Graylog.
Note
NSG flow logs don't support storage accounts on the Azure classic portal.
Diagnostic logs
In Network Watcher, diagnostic logs are a central place to enable and disable logs for
Azure network resources. These resources might include NSGs, public IPs, load
balancers, and app gateways. After you've enabled the logs that interest you, you can
use the tools to query and view log entries.
You can import diagnostic logs into Power BI and other tools to analyze them.
Traffic analytics
To investigate user and app activity across your cloud networks, use traffic analytics.
The tool gives insights into network activity across subscriptions. You can diagnose
security threats such as open ports, VMs communicating with known bad networks, and
traffic flow patterns. Traffic analytics analyzes NSG flow logs across Azure regions and
subscriptions. You can use the data to optimize network performance.
This tool requires Log Analytics. The Log Analytics workspace must exist in a supported
region.
To resolve slow performance, you need to determine the root cause of the problem:
First, check that the VM size is appropriate for the job. Next, enable Azure Diagnostics
on the VM to get more granular data for specific metrics, such as CPU usage and
memory usage. To enable VM diagnostics via the portal, go to the VM,
select Diagnostics Settings, and then turn on diagnostics.
Let's assume you have a VM that has been running fine. However, the VM's performance
has recently degraded. To identify if you have any resource bottlenecks, you need to
review the captured data.
Start with a time range of captured data before, during, and after the reported problem
to get an accurate view of performance. These graphs can also be useful for cross-
referencing different resource behaviors in the same period. You'll check for:
CPU bottlenecks
Memory bottlenecks
Disk bottlenecks
CPU bottlenecks
When you're looking at performance issues, examine trends and understand if they
affect your server. To spot trends, from the portal, use the monitoring graphs. You might
see different types of patterns on the monitoring graphs:
Memory bottlenecks
You can view the amount of memory that the VM uses. Logs will help you understand
the trend and if it maps to the time at which you see issues. You should not have less
than 100 MB of available memory at any time. Watch out for the following trends:
For immediate relief or page file usage, increase the size of the VM to add
memory, and then monitor.
Investigate the issue further. Locate that app or process, and troubleshoot
it. If you know the app, see if you can cap the memory allocation.
Disk bottlenecks
Network performance might also be related to the storage subsystem of the VM. You
can investigate the storage account for the VM in the portal. To identify issues with
storage, look at performance metrics from the storage account diagnostics and the VM
diagnostics. Look for key trends when the issues occur within a particular time range.
To troubleshoot an NSG flow issue, use the Network Watcher IP flow verify tool and
NSG flow logging to determine whether an NSG or User Defined Routing (UDR) is
interfering with traffic flow.
Run IP flow verify, and specify the local VM and the remote VM. After you select Check,
Azure runs a logical test on rules in place. If the result is that access is allowed, use NSG
flow logs.
In the portal, go to the NSGs. Under the flow log settings, select On. Now try to connect
to the VM again. Use Network Watcher traffic analytics to visualize the data. If the result
is that access is allowed, there's no NSG rule in the way.
If you've reached this point and still haven't diagnosed the problem, there might be
something wrong on the remote VM. Disable the firewall on the remote VM, and then
retest connectivity. If you can connect to the remote VM with the firewall disabled, verify
the remote firewall settings. Then re-enable the firewall.
By default, all subnets can communicate in Azure. If two VMs on two subnets can't
communicate, there must be a configuration that's blocking communication. Before you
check the flow logs, run the IP flow verify tool from the front end VM to the back end
VM. This tool runs a logical test on the rules on the network.
If the result is an NSG on the back end subnet blocking all communication, reconfigure
that NSG. For security purposes, you must block some communication with the front
end because the front end is exposed to the public internet.
By blocking communication to the back end, you limit the amount of exposure in the
event of a malware or security attack. However, if the NSG blocks everything, then it's
incorrectly configured. Enable the specific protocols and ports that are required.
In Azure Network Watcher, metrics and logs can diagnose complex configuration issues.
Suppose you have two virtual machines (VMs) that can't communicate. You want to
obtain as much information as you can to diagnose the problem.
In this unit, you'll troubleshoot by using Network Watcher metrics and logs. To diagnose
the connectivity issue between the two VMs, you'll then use the network security group
(NSG) flow logs.
1. Sign in to the Azure portal, and log in to the directory with access to the
subscription you created resources in.
2. In the Azure portal, search for and select Subscriptions.
3. Select your subscription. Then under Settings, select Resource providers.
4. In the search bar, enter microsoft.insights.
5. If the status of the microsoft.insights provider is Unregistered,
select Register.
Setting Value
On Basics tab, under Project details section:
Subscription Select your subscription
Resource group Select your resource group
Under Instance details section:
Storage account name Create a unique name
Location Select the same region as your reso
Setting Value
Performance Standard
Account kind StorageV2
Replication Read-access geo-redundant storage
Setting
Under Blob storage section:
Blob access tier (default)
Setting Value
On Basics tab, under Project details section:
Subscription Select your subscription
Resource group Select your resource group
Under Instance details section:
Name testsworkspace
Region Select the same region as your reso
Setting Value
On Pricing tier tab, under Pricing tier section:
Pricing tier Pay-as-you-go
Enable flow logs
To set up flow logs, you must configure the NSG to connect to the storage account, and
add traffic analytics for the NSG.
PowerShellCopy
Test-NetConnection 10.10.2.4 -port 80
PowerShellCopy
Test-NetConnection 10.10.2.4 -port 80
Summary
5 minutes
In this module, you learned about the four tool categories and the features offered.
Azure Network Watcher provides all the tools you need to monitor, troubleshoot, and
optimize your network. This module primarily focused on monitoring and diagnostic
tools, such as:
Connection Monitor
IP flow verify
Next hop
Packet capture
Connection troubleshoot
Effective security rules
NSG flow logs
Diagnostic logs
Learn more
Azure Network Watcher Agent virtual machine extension for Windows
Network Watcher Agent virtual machine extension for Linux
Visualizing network security group flow logs with Power BI
Visualize Azure Network Watcher NSG flow logs using open-source tools
Network Performance Monitor supported regions