0% found this document useful (0 votes)
27 views

Slideplayer-Com-Sl

Uploaded by

dvdsenthil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Slideplayer-Com-Sl

Uploaded by

dvdsenthil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Search...

Search Upload  Log in 

VMware vSphere ® 6.0


Knowledge Transfer Kit
Technical Walk-Through

© 2015 VMware Inc. All rights reserved.

VMware vSphere® 6.0 Knowledge Transfer Kit  623


Published by Baldwin Hall Modified over 7 years ago

 Embed  Download presentation

 Similar presentations
 More

Presentation on theme: "VMware vSphere® 6.0 Knowledge Transfer Kit"— Presentation


transcript:

1 VMware vSphere® 6.0 Knowledge Transfer Kit


Technical Walk-Through

2 Agenda VMware ESXi™ Virtual Machines VMware vCenter Server™


VMware vSphere vMotion®
Availability
VMware vSphere High Availability
VMware vSphere Fault Tolerance
VMware vSphere Distributed Resource Scheduler™
Content Library
VMware Certificate Authority (CA)
Storage
Networking

3 Technical Walk-Through
The technical walk-through expands on the architectural presentation to provide more detailed technical best
practice and troubleshooting information for each topic
This is not comprehensive coverage of each topic
If you require more detailed information use the VMware vSphere Documentation ( and VMware Global
Support Services might be of assistance

4 ESXi
5 Components of ESXi
The ESXi architecture comprises the underlying operating system, called the VMkernel, and processes that run
on top of it
VMkernel provides a means for running all processes on the system, including management applications and
agents as well as virtual machines
It has control of all hardware devices on the server and manages resources for the applications
The main processes that run on top of VMkernel are
Direct Console User Interface (DCUI)
Virtual Machine Monitor (VMM)
VMware Agents (hostd, vpxa)
Common Information Model (CIM) System

6 Components of ESXi (cont.)


Direct Console User Interface
Low-level configuration and management interface, accessible through the console of the server, used
primarily for initial basic configuration
Virtual Machine Monitor
Process that provides the execution environment for a virtual machine, as well as a helper process known as
VMX. Each running virtual machine has its own VMM and VMX process
VMware Agents (hostd and vpxa)
Used to enable high-level VMware Infrastructure™ management from remote applications
Common Information Model System
Interface that enables hardware-level management from remote applications through a set of standard APIs

7 ESXi Technical Details


VMkernel
A POSIX-like operating system developed by VMware, which provides certain functionality similar to that
found in other operating systems, such as process creation and control, signals, file system, and process
threads
Designed specifically to support running multiple virtual machines and provides such core functionality such
as
Resource scheduling
I/O stacks
Device drivers
Some of the more pertinent aspects of the VMkernel are presented in the following sections

8 ESXi Technical Details (cont.)


File System
VMkernel uses a simple in-memory file system to hold the ESXi Server configuration files, log files, and staged
patches
The file system structure is designed to be the same as that used in the service console of traditional ESX
Server. For example
ESX Server configuration files are found in /etc/vmware
Log files are found in /var/log/vmware
Staged patches are uploaded to /tmp
This file system is independent of the VMware vSphere VMFS file system used to store virtual machines
The in-memory file system does not persist when the power is shut down. Therefore, log files do not survive a
reboot if no scratch partition is configured
ESXi has the ability to configure a remote syslog server and remote dump server, enabling you to save all log
information on an external system

9 ESXi Technical Details (cont.)


User Worlds
The term user world refers to a process running in the VMkernel operating system. The environment in which
a user world runs is limited compared to is found in a general-purpose POSIX-compliant operating system
such as Linux
The set of available signals is limited
The system API is a subset of POSIX
The /proc file system is very limited
A single swap file is available for all user world processes. If a local disk exists, the swap file is created
automatically in a small VFAT partition. Otherwise, the user is free to set up a swap file on one of the attached
VMFS datastores
Several important processes run in user worlds. Think of these as native VMkernel applications. They are
described in the following sections

10 ESXi Technical Details (cont.)


Direct Console User Interface (DCUI)
DCUI is the local user interface that is displayed only on the console of an ESXi system
It provides a BIOS-like, menu-driven interface for interacting with the system. Its main purpose is initial
configuration and troubleshooting
The DCUI configuration tasks include
Set administrative password
Set Lockdown mode (if attached to VMware vCenter™)
Configure and revert networking tasks
Troubleshooting tasks include
Perform simple network tests
View logs
Restart agents
Restore defaults

11 ESXi Technical Details (cont.)


Other User World Processes
Agents used by VMware to implement certain management capabilities have been ported from running in the
service console to running in user worlds
The hostd process provides a programmatic interface to VMkernel, and it is used by direct VMware vSphere
Client™ connections as well as APIs. It is the process that authenticates users and keeps track of which users
and groups have which privileges
The vpxa process is the agent used to connect to vCenter. It runs as a special system user called vpxuser. It
acts as the intermediary between the hostd agent and vCenter Server
The FDM agent used to provide vSphere High Availability capabilities has also been ported from running in the
service console to running in its own user world
A syslog daemon runs as a user world. If you enable remote logging, that daemon forwards all log files to the
remote target in addition to putting them in local files
A process that handles initial discovery of an iSCSI target, after which point all iSCSI traffic is handled by the
VMkernel, just as it handles any other device driver

12 ESXi Technical Details (cont.)


Open Network Ports – A limited number of network ports are open on ESXi. The most important ports and
services are
80 – This port serves a reverse proxy that is open only to display a static Web page that you see when
browsing to the server. Otherwise, this port redirects all traffic to port 443 to provide SSL-encrypted
communications to the ESXi Server
443 (reverse proxy) – This port also acts as a reverse proxy to a number of services to provide SSL- encrypted
communication to these services. The services include API access to the host, which provides access to the
RCLIs, the vSphere Client, vCenter Server, and the SDK
5989 – This port is open for the CIM server, which is an interface for third-party management tools
902 – This port is open to support the older VIM API, specifically the older versions of the vSphere Client and
vCenter
Many other ports depending on what is configured (vSphere High Availability, vSphere vMotion, and so on)
have their own port requirements, but this are only opened if these services are configured
13 ESXi Troubleshooting
Troubleshooting ESXi is very much the same as any operating system
Start by narrowing down the component which is causing the problem
Next review the logs as required to narrow down the issue
Common log files are as follows
/var/log/auth.log: ESXi Shell authentication success and failure
/var/log/esxupdate.log: ESXi patch and update installation logs
/var/log/hostd.log: Host management service logs, including virtual machine and host Task and Events,
communication with the vSphere Client and vCenter Server vpxa agent, and SDK connections
/var/log/syslog.log: Management service initialization, watchdogs, scheduled tasks and DCUI use
/var/log/vmkernel.log: Core VMkernel logs, including device discovery, storage and networking device and
driver events, and virtual machine startup
/var/log/vmkwarning.log: A summary of Warning and Alert log messages excerpted from the VMkernel logs
/var/log/vmksummary.log: A summary of ESXi host startup and shutdown, and an hourly heartbeat with
uptime, number of virtual machines running, and service resource consumption
/var/log/vpxa.log: vCenter Server vpxa agent logs, including communication with vCenter Server and the Host
Management hostd agent
/var/log/fdm.log: vSphere High Availability logs, produced by the FDM service

14 ESXi Best Practices


For in depth ESXi and other component practices, read the Performance Best Practices Guide ( )
Always set up the VMware vSphere Syslog Collector (Windows) / VMware Syslog Service (Appliance) to
remotely collect and store the ESXi log files
Always set up the VMware vSphere ESXi Dump Collector Service to allow dumps to be remotely collected in
the case of a VMkernel failure
Ensure that only the firewall ports required by running services are enabled in the Security profile
Ensure the management network is isolated from the general network (VLAN) to decrease the attack surface
of the hosts
Ensure the management network has redundancy through NIC Teaming or by having multiple management
interfaces
Ensure that the ESXi Shell and SSH connectivity are not permanently enabled
Performance Best Practices Guide for vSphere 6.0
(Soon to be released)
Performance Best Practices Guide for vSphere 5.5

15 Virtual Machines
16 Virtual Machine Troubleshooting
Virtual machines run as processes on the ESXi host
Troubleshooting is split into two categories
Inside the Guest OS – Standard OS troubleshooting should be used, including the OS-specific log files
ESXi host level troubleshooting – Concerning the virtual machine process, where the log file for the virtual
machine is reviewed for errors
ESXi host virtual machine log files are located in the directory which the virtual machine runs by default, and
are named vmware.log
Generally issues occur as a result of a problem in the guest OS
Host level crashes of the VM processes are relatively rare and are normally a result of hardware errors or
compatibility of hardware between hosts

17 Virtual Machine Best Practices


Virtual machines should always run VMware Tools™ to ensure that the correct drivers are installed for virtual
hardware
Right-size VMs to ensure that they use only required hardware. If VMs are provisioned with an over-allocation
of resources that are not used, ESXi host performance and capacity is reduced
Any devices not being used should be disconnected from VMs (CD-ROM/DVD, floppy, and so on)
If NUMA is used on ESXi, VMs should be right-sized to the size of the NUMA nodes on the host to avoid
performance loss
VMs should be stored on shared storage to allow for the maximum vSphere vMotion compatibility and
vSphere High Availability configurations in a cluster
Memory/CPU reservations should not be used regularly because they reserve the resource and can prevent
the VMware vSphere Hypervisor from being able to take advantage of over commitment technologies
VMs partitions should be aligned to the storage array partition alignment
Storage and Network I/O Control can dramatically help VM performance in times of contention

18 vCenter Server
19 vCenter Server 6.0 with Embedded Platform Services Controller
SSO
CM
License
IS
Web
TOOLS
Platform Services Controller
Management Node
Sufficient for most environments
Easiest to maintain and deploy
Recommended - 8 or less vCenter Servers
vCenter Server and the infrastructure controller are deployed on a single virtual machine or physical host.
vCenter Server with embedded infrastructure controller is suitable for smaller environments with eight or less
product instances.
To provide the common services, such as vCenter Single Sign-On, across multiple products and vCenter Server
instances, you can connect multiple vCenter Server instances with embedded infrastructure controllers
together.
You can do this by replicating the vCenter Single Sign-On data from one of the Infrastructure Controller to the
other Infrastructure Controllers. This way, infrastructure data for each product is replicated to all of the
infrastructure controllers, and each individual infrastructure controller contains a copy of the data for all of
the infrastructure controllers.
The Embedded Infrastructure Controller supports both an internal database, which is vPostgres or external
database, such as Oracle and Microsoft Server.
The vCenter Server 6.0 with Embedded Infrastructure Controller is available for both Windows and Virtual
Appliance format.
Supports embedded and external database
Available for Windows and vCenter Server Appliance

20 vCenter Server 6.0 with External Platform Services Controller


SSO
CM
License
IS
Web
TOOLS
VC
Platform Services Controller
Management Node
For larger customers with numerous vCenter Servers
Reduces infrastructure by sharing Platform Services Controller across several vCenter Servers
Recommended - 9 or more vCenter Servers
vCenter Server and the infrastructure controller are deployed on separate virtual machines or physical hosts.
The Platform Services Controller can be shared across many products. This configuration is suitable for larger
environments with nine or more product instances.
To provide the common services, such as vCenter Single Sign-On, across multiple products and vCenter Server
instances, the products are connected together through the Platform Services Controller. The infrastructure
data that each product needs is in the Platform Services Controller.
If you have more than one Platform Services Controller, you can set up the controllers to replicate data with
each other all the time, so that the data from each Platform Services Controller is shared with every product.
The Platform Services Controller is lightly loaded and doesn’t use as many resources as the management
nodes.
Supports embedded and external database
Available for Windows and vCenter Server Appliance

21 Installation / Upgrade Overview


Component MSI Deployment
~ 50 component MSIs
Most MSIs only deliver binaries and configuration files to the machine
Firstboot Scripts
The installation process consists of two key parts.
First the vSphere installer copies approximately 44 MSI files to the installation directory of the Windows
machine. These MSI files are binary installation and configuration files. All MSI files are installed silently.
[Click]
Second, once the MSI files are copied to the installation folder the Firstboot process starts. Depending on the
deployment type that you choose, a different number of services will be installed.
An Embedded node has about 25 services, The Management node has about 24 and an Infrastructure
Controller has about 7 services installed.
Firstboot scripts are written in Python and the same scripts are used for both the Windows installer and the
vCenter Virtual Appliance.
Depending on your deployment type, these Python firstboot scripts are run to install and configure the
Components.
Firstboot scripts also take care of generating certificates and registering the components with the Component
Manager Service. It will also create and start the Services, and if needed, open ports in the firewall.
~28 firstboot scripts – depending on the install method
Configures the product
Generates certificates
Registration between components
Creates and starts services
Opens ports in firewall

22 Installation Overview/Architecture
This graphic shows how the installation progresses for vSphere 6
Parent
MSI
Component MSIs
(~50)
Services
(Firstboot
Scripts)
VPXD
VCDB (vPostgres) Or External
vpxd
VC Prefs
NGC
Net Dump
vCenter Server 6.0 Installer
SSO
VPXD
LOTUS
SSO
SMS/SPBM
Here’s a graphical overview of the installation process.
Once the installation media is download and mounted on the destination machine, the vSphere 6.0 Installer
menu is launched.
Once you pick the option to install vCenter for Windows, you will be prompted with a series of questions such
as
Deployment type
What database you want to use
Credentials for SSO, and so on
Once the installer has captured all this information the MSI files are copied from the installation media to the
destination installation folder. The number of MSIs will vary depending on the installation type.
28 firstboot scripts will be copied for an embedded node. Less will be copied for other deployment types.
[CLICK]
Once the MSIs are copied they are installed and the firstboot process starts. The firstboot process run and
configures the services and performs tasks such as generating certificates, installing and starting services and
registering the components.
As the firstboot process progresses, the components are installed and configured until you have successfully
installed all the components.
NGC
CM
Auto Deploy

Flat File
NGC

Service Manager
MSI
SSO
IS
VCO
XDB
CM
Licensing
KV
VCO DB

AuthZ

23 Platform Services Controller


Upgrade Scenarios
The upgrade decision-making process depends on the placement of the vCenter Single Sign- On service and
the vCenter Services
If a machine has Single Sign-On and vCenter Server installed, it becomes vCenter Server with embedded
Platform Services Controller
vSphere 5.x
All vSphere components installed on the same host
vSphere 6.0
Embedded Deployment
vCenter Server
Platform Services Controller
Web Client
vCenter Server
Inventory Service
Single Sign-On
24 Upgrade Scenarios (cont.)
If one machine has vCenter Server and Inventory Service installed
vSphere Web Client and Single Sign-On are in separate machines
5.x
Web Client
vCenter Server
Management Node
Inventory Service
6.0
Inventory Service
vCenter Server
Web Client
Single Sign-On
PSC
Single Sign-On

25 Upgrade Scenarios (cont.)


All vSphere 5.x components are installed on separate servers/VMs
After upgrade, Web Client and Inventory Service are defunct
Web Client and Inventory Service become part of Management Node
Web Client
vCenter Server
Inventory Service
Management Node
Web Client
vCenter Server
Inventory Service
Single Sign-On
Single Sign-On
PSC

26 Upgrade Scenarios (cont.)


vSphere 6.0 still requires a load balancer for vSphere High Availability
We are still working out the details
F5 and Citrix Netscaler
vCenter Server
Management Node
Web Client
vCenter Server
Inventory Service
Load Balancer
Load Balancer
Single Sign-On
PSC
Single Sign-On
PSC
Single Sign-On
Single Sign-On

27 vCenter Troubleshooting
vCenter for windows has been consolidated and organized in this release
Installation and logging directories mimic the vCenter Server Appliance in previous releases
Start by narrowing down the component which is causing the problem
Next review the logs as required to narrow down the issue
Each process now has its own logging directory:
28 vCenter Troubleshooting – Installer Logs
vSphere Installer logs
Can show up in %TEMP% or %TEMP%\<number> e.g. %TEMP%\1
vminst.log – Logging created by custom actions – usually verification and handling of MSI properties
*msi.log (for example, vim-vcs-msi.log or vm-ciswin-msi.log)
MSI installation log–strings produced by the Microsoft Installer backend
pkgmgr.log – contains a list of installed sub-MSIs (for example, VMware-OpenSSL.msi) and the command lines
used to install them
pkgmgr-comp-msi.log – the MSI installation logs for each of the ~50 sub-MSIs (appended into one file)

29 vCenter Troubleshooting – Log Bundle


Generate log bundles from the command prompt
In the command prompt navigate to
C:\Program Files\VMware\vCenter Server\bin
Run the command vc-support.bat -w %temp%\
The log bundle is generated in the system temp directory
The VC-Support is located in
C:\Program Files\VMware\vCenter Server\bin
To run it, type vc-support.bat in
C:\Program Files\VMware\vCenter Server\bin folder
Without any parameters, the vc-support output bundle will be written to the %TEMP% folder.
You specify an alternate location using the –w switch.
...

30 vCenter Server Log Bundle


Support bundle comes as TGZ format
Once the support bundle is on local machine you must unzip it
Commands – OS-specific details
Program Files – Configuration and properties for vCenter components
ProgramData – Component log files and firstboot logs
Users – %TEMP% folder
Windows – local Hosts file
The output from the vc-support bundle is in TGZ format.
Components of a support bundle include
Commands – OS-specific details
Program Files – Configuration and properties for vCenter components
ProgramData – Component log files and firstboot logs are located here
Users - %TEMP% folder
Windows – local hosts file

31 VC-Support – ProgramData > VMware > CIS – cfg


What is my database and deployment node type?
\ProgramData\VMware\CIS\cfg\db.type
\ProgramData\VMware\CIS\cfg\deployment.node.type
vCenter Configuration files – vmware-vpx folder
vpxd.cfg, vcdb.properties, embedded_db.cfg

32 vCenter Best Practices


Verify that vCenter, the Platform Services Controller, and any database have adequate CPU, memory, and disk
resources available
Verify that the proper inventory size is configured during the installation
Minimize latency between components (vCenter and Platform Services Controller) by minimizing network
hops between components
External databases should be used for large deployments
If using Enhanced Linked Mode, VMware recommends having external Platform Services Controllers
Verify that DNS is configured and functional for all components
Verify that time is correct on vCenter and all other components in the environment
VMware vSphere Update Manager™ should be installed on a separate system if inventory is large

33 vSphere vMotion
34 vSphere vMotion and vSphere Storage vMotion Troubleshooting
vSphere vMotion and vSphere Storage vMotion are some of the best logged features in vSphere
Each migration that occurs has a unique Migration ID (MID) that can be used to search logs for the vSphere
vMotion and vSphere Storage vMotion
MIDs look as follows:
Each time a vSphere vMotion and vSphere Storage vMotion is attempted, all logs can be reviewed to find the
error using grep and searching for the term Migrate
Both the source and the destination logs should be reviewed
The following is a list of common log files and errors
VMKernel.log – VMkernel logs usually contain storage or network errors (and possibly vSphere vMotion and
vSphere Storage vMotion timeouts)
hostd.log – contains interactions between vCenter and ESXi
vmware.log – virtual machine log file which will show issues with starting the virtual machine processes
vpxd.log – vSphere vMotion as seen from vCenter normally shows a timeout or other irrelevant data because
the errors are occurring on the host itself

35 vSphere vMotion Troubleshooting – Example vmkernel.log – Source


T16:47:04.555Z cpu0:305224)Migrate: vm : InitMigration:3215: Setting VMOTION info: Source ts = , src ip = < >
dest ip = < > Dest wid = using SHARED swap
T16:47:04.571Z cpu0:305224)Migrate: StateSet:158: S: Changing state from 0 (None) to 1 (Starting migration
off)
T16:47:04.572Z cpu0:305224)Migrate: StateSet:158: S: Changing state from 1 (Starting migration off) to 3
(Precopying memory)
T16:47:04.587Z cpu1:3589)Migrate: VMotionServerReadFromPendingCnx:192: Remote machine is ESX 4.0 or
newer.
T16:47:05.155Z cpu1:588763)VMotionSend: PreCopyStart:1294: S: Starting Precopy, remote version
T16:47:07.985Z cpu1:305226)VMotion: MemPreCopyIterDone:3927: S: Stopping pre-copy: only 156 pages left
to send, which can be sent within the switchover time goal of seconds (network bandwidth ~ MB/s, 51865%
t2d)
T16:47:07.991Z cpu1:305226)VMotion: PreCopyDone:3259:
Migration ID
S: for Source
Troubleshooting: Logging – Source
We can see the source and destination vMotion interface IP addresses.
Next, we see the Migration ID and the state changing from no vMotion to starting.
Notice that S: means that this log file is from the course ESXi Server.
We start the pre-copies. We do a number of pre-copies until the amount of memory left to transfer is small–
enough that we can transfer in less than a second.
35
36 vMotion Troubleshooting – Example vmkernel.log – Destination
Migration ID is the same on the Destination
T16:45:35.156Z cpu1:409301)Migrate: vm : InitMigration:3215: Setting VMOTION info: Dest ts = , src ip = < >
dest ip = < > Dest wid = 0 using SHARED swap
T16:45:35.190Z cpu1:409301)Migrate: StateSet:158: D: Changing state from 0 (None) to 2 (migration on)
T16:45:35.432Z cpu0:3556)Migrate: VMotionServerReadFromPendingCnx:192: Remote machine is ESX 4.0 or
newer.
T16:45:36.101Z cpu1:409308)VMotionRecv: PreCopyStart:416: D: got MIGRATE_MSG_PRECOPY_START
T16:45:36.101Z cpu1:409308)Migrate: StateSet:158: D: Changing state from 2 (Starting migration on) to 3
(Precopying memory)
T16:45:38.831Z cpu0:409308)VMotionRecv: PreCopyEnd:466: D: got MIGRATE_MSG_PRECOPY_END
T16:45:38.831Z cpu0:409308)VMotionRecv: PreCopyEnd:478: D: Estimated network bandwidth MB/s during
pre-copy
T16:45:38.917Z cpu0:409308)Migrate: StateSet:158: D: Changing state from 3 (Precopying memory) to 5
(Transferring cpt data)
T16:45:39.070Z cpu0:409308)Migrate: StateSet:158: D: Changing state from 5 (Transferring cpt data) to 6
(Loading cpt data)
D: for Destination
Troubleshooting: Logging – Destination
This is the destination log file for the same vMotion.
We can tell it’s the same as the Migrate ID is identical
We can tell it’s the destination because it uses D.
Highlighted is the entry that is logged when it receives a message from the source telling it that it’s finished
the pre-copy iterations.
At the bottom, we can see it loading the checkpoint data.
36
37 vSphere vMotion Best Practices
ESXi host hardware should be as similar as possible to avoid failures
VMware Virtual Machine Hardware compatibility is important to avoid failures as newer hardware revisions
cannot be run on older ESXi hosts
10 Gb networking will improve vSphere vMotion performance
vSphere vMotion networking should be segregated form other traffic to prevent saturation of network links
Multiple network cards can be configured for vSphere vMotion VMkernel networking to improve performance
of migrations

38 vSphere Storage vMotion Best Practices


If vSphere Storage vMotion traffic takes place on storage that might also have other I/O loads (from other VMs
on the same ESXi host or from other hosts), it can further reduce the available bandwidth, so it should be
done during times when there will be less impact
„vSphere Storage vMotion will have the highest performance during times of low storage activity (when
available storage bandwidth is highest) and when the workload in the VM being moved is least active
vSphere „Storage vMotion can perform up to four simultaneous disk copies per vSphere Storage vMotion
operation. However, vSphere Storage vMotion will involve each datastore in no more than one disk copy at
any one time. This means, for example, that moving four VMDK files from datastore A to datastore B will occur
serially, but moving four VMDK files from datastores A, B, C, and D to datastores E, F, G, and H will occur in
parallel
For performance-critical vSphere Storage vMotion operations involving VMs with multiple VMDK files, you can
use anti-affinity rules to spread the VMDK files across multiple datastores, thus ensuring simultaneous disk
copies
„vSphere Storage vMotion will often have significantly better performance on vStorage APIs for Array
Integration (VAAI)-Capable storage arrays

39 Availability
vSphere High Availability

40 vSphere High Availability Technical Details


vSphere High Availability uses a different agent model from that used by previous versions of vSphere High
Availability
There is no longer any primary/secondary host designation
It uses both the management network and storage devices for communication
It introduces IPv6 support
vSphere High Availability also has a new deployment and configuration mechanism which reduces the cluster
configuration time to ~1 minute by configuring hosts in parallel rather than serially
This is a vast improvement when compared to the ~1 minute per host in previous versions of vSphere High
Availability
It supports the concept of management network partitioning, where the cluster can continue to function
when some hosts are unreachable on the management network
Error reporting has also been improved with FDM. Now it uses a single log file per host and supports syslog
integration

41 vSphere High Availability Technical Details (cont.)


In the vSphere High Availability architecture, each host in the cluster runs an FDM agent
The FDM agents do not use vpxa and are completely decoupled from it
The agent (or FDM) on one host is the master, and the agents on all other hosts are its slaves
When vSphere High Availability is enabled, all FDM agents participate in an election to choose the master
The agent that wins the election becomes the master
If the host that is serving as the master should subsequently fail, be shutdown, or need to abdicate its role, a
new master election is held

42 vSphere High Availability Technical Details – Role of the Master


A master monitors ESXi hosts and VM availability
A master will monitor slave hosts and it will restart VMs in the event of a slave host failure
It manages the list of hosts that are members of the cluster and manages adding and removing hosts from
the cluster
It monitors the power state of all the protected VMs, and if one should fail, it will restart the VM
It manages the list of protected VMs and updates this list after each user-initiated power on or power off
It sends heartbeats to the slaves so the slaves know the master is alive
It caches the cluster configuration and informs the slaves of changes in configuration
A master will reports state information to vCenter through property updates
Comparing this list to the existing product, note that many of the master's responsibilities also exist in the
AAM architecture.
For example, in both architectures, the health of hosts are monitored.
In the AAM architecture, these responsibilities were shared by the vpxa/HA agents on each host, and by the
AAM primary that ran the rule.
In the FDM architecture, these responsibilities are all provided by the FDM agent running on the master host.

43 vSphere High Availability Technical Details – Role of the Slave


A slave monitors the runtime state of the VMs running locally and forwards significant state changes to the
master
It implements vSphere High Availability features that do not require central coordination, most notably VM
health monitoring
It monitors the health of the master, and if the master should fail, it participates in a new master election

44 vSphere High Availability Technical Details – Master and Slave Summary Views
45 vSphere High Availability Technical Details – Master Election
A master is elected when the following conditions occur
vSphere High Availability is enabled
A master host fails
A management network partition occurs
The following algorithm is used for selecting the master
If a host has the greatest number of datastores, it is the best host
If there is a tie, then the host with the lexically highest moid is chosen. For example moid "host-99" would be
higher than moid "host-100" since 9 is greater than 1
After a master is elected and contacts vCenter, vCenter sends a compatibility list to the master which saves it
on its local disk, and then pushes it out to the slave hosts in the cluster
vCenter normally only talks to a master. It will sometimes talk to FDM agents on other hosts, especially if
master states that it cannot reach the slave agent. vCenter will try to contact the other host to figure out why
Moid – Managed Object ID – vCenter identifier
There are some other scenarios when vCenter will talk to the other FDM agents
When scanning for master
When vCenter powers on a vSphere FT secondary VM
When host is reported isolated or partitioned

46 vSphere High Availability Technical Details – Partitioning


Under normal operating conditions, there is only one master
However, if a management network failure occurs, a subset of the hosts might become isolated. This means
that they cannot communicate with the other hosts in the cluster over the management network
In such a situation, when the hosts can continue to ping the isolation response IP, but not other hosts, FDM is
called network partitioned
Each partition without an existing master will elect a new one
Thus, a partitioned cluster state will have multiple masters, one per partition
However, vCenter cannot report back on more than one master, so you could be getting only one partition
details – the master that vCenter finds first
When a network partition is corrected, one of the masters will take over from the others, thus reverting back
to a single master

47 vSphere High Availability Technical Details – Isolation


In some ways this is similar to a network partition state, except that a host can no longer ping the default
gateway/isolation IP address
In this case, a host is called network isolated
The host has the ability to inform the master that it is in this isolation state, through files on the heartbeat
datastores, which will be discussed shortly
Then the Host Isolation Response is checked to see when the VMs on this host should be shut down or left
powered on
If they are powered off, they can be restarted on other hosts in the cluster

48 vSphere High Availability Technical Details – Virtual Machine Protection


The master is responsible for restarting any protected VMs that fail
The trigger to protect a VM is the master observing that the power state of the VM changes from powered off
to powered on
The trigger to unprotect a VM is the master observing the VM’s power state changing from powered on to
power off
After the master protects the VM, the master will inform vCenter that the VM has been protected, and vCenter
will report this fact through the vSphere High Availability Protection runtime property of the VM
Because templates are in essence powered-off VMs, templates are not protected. Further, VMs that are
created from the templates are not protected until the VMs are powered on.
Periodically (that is, once every 5 minutes), vCenter will compare the list it has to the protected VM list last
reported by the FDM master. If there are any VMs on the vCenter list but not the master's, vCenter will call
into the master to inform it of the difference.
The master, in turn, will ensure that each VM in the list provided by vCenter has been protected.

49 vSphere High Availability Troubleshooting


Troubleshooting vSphere High Availability since vSphere 5.1 is greatly simplified
Agents were upgraded from using a third party component to using a component built by VMware called Fault
Domain Manager (FDM)
A single log file, fdm.log, now exists for communication of all events related to vSphere High Availability
When troubleshooting a vSphere High Availability failure, be sure to collect logs from all hosts in the cluster
This is because when a vSphere High Availability event occurs, VMs might be moved to any host in the cluster.
To track all events, the FDM log for each host (including the master host) is required
This should be the first point of call for
Partitioning issues
Isolation issues
VM protection issues
Election issues
Failure to failover issues
50 vSphere High Availability Best Practices
Networking
When performing maintenance use the host network maintenance feature to suspend vSphere High
Availability monitoring
When changing networking configuration, always reconfigure vSphere High Availability afterwards
Specify which networks are used for vSphere High Availability communication. By default, this is the
management network
Specify isolation addresses as appropriate for the cluster, if the default gateway does not allow for ICMP pings
Network paths should be redundant to avoid isolations of vSphere High Availability

51 vSphere High Availability Best Practices (cont.)


Interoperability
Do not mix versions of ESXi in the same cluster
Virtual SAN uses its network for vSphere High Availability, rather than the default
When enabling Virtual SAN, vSphere High Availability should be disabled first and then enabled
Admission Control
Select the policy that best matches the need in the environment
Do not disable admission control or VMs might not all be able to fail over if an event occurs
Size hosts equally to prevent imbalances

52 Availability
vSphere FT

53 vSphere FT Troubleshooting
vSphere FT has been completely rewritten in vSphere 6.0
Now, CPU compatibility is the same as vSphere vMotion compatibility because the same technology is used to
ship memory, CPU, storage, and network states across to the secondary virtual machine
When troubleshooting
Get logs for both primary and secondary VMs and hosts
Grab logs before log rotation
Ensure time is synchronized on all hosts
When reviewing the configuration, you should find both primary and secondary VMX logs in the primary VMs
directory
They will named vmware.log and vmware-snd.log
Also, be sure to review vmkernel.log and hostd.log from both the primary and secondary hosts for errors

54 vSphere FT Troubleshooting – General Things To Look For (vmkernel, vmx)


T18:12:25.892Z cpu3:35660)FTCpt: 2401: ( pri) Primary init: nonce
T18:12:25.892Z cpu3:35660)FTCpt: 2440: ( pri) Setting allowedDiffCount = 64
T18:12:25.892Z cpu3:35660)FTCpt: 1217: Queued accept request for ftPairID
T18:12:25.892Z cpu3:35660)FTCpt: 2531: ( pri) vmx vmm 35662
T18:12:25.892Z cpu1:32805)FTCpt: 1262: ( pri) Waiting for connection
Generally, multiprocessor vSphere FT messages will prefix with “FTCpt:” in vmkernel and vmx logs.
Like vMotion, vSphere FT sessions have an vSphere FT id unique identifier, taken from the migration id that
started it, shared by: vmx, vmkernel, primary, and secondary (can be used to verify all logs present).
The role of the VM is either “pri” or “snd”.
vSphere FT messages will prefix with “FTCpt:”
Like vSphere vMotion, vSphere FT sessions have an vSphere FT id unique identifier taken from the migration
ID that started it
The role of the VM is either “pri” or “snd”

55 vSphere FT Troubleshooting – Legacy vSphere FT or vSphere FT?


vmware.log file
Search for: “ftcpt.enabled”
If present and set to “TRUE”: FT
Otherwise, legacy vSphere FT
Important for triaging failures
Check vmx log file of primary or secondary.
Search for: “ftcpt.enabled”
If present and set to “TRUE”: multiprocessor FT
Otherwise: uniprocessor FT
Important for triaging failures

56 vSphere FT Troubleshooting – Has vSphere FT Started?


vmkernel.log
T14:32:13.607Z cpu5:89619)FTCpt: 3831: ( pri) Start stamp: T14:32:13.607Z nonce

T14:46:23.860Z cpu2:89657)FTCpt: 9821: ( pri) Last ack stamp: T14:46:15.639Z nonce
Grepping for FTCpt in the vmkernel.log provides a robust set of information.
You can see vSphere FT starting for a VM by finding the keywords Start Stamp and Last ack stamp.
Also the vmware.log file provides very clean and easy-to-read notifications of the vSphere FT state changes.
Again, grep for FTCpt.
When the secondary is being created, a XvMotion is started, using the vMotion network. If this fails vSphere FT
will fail to start.
vmware.log
T22:56:01.635Z| vcpu-0| I120: FTCpt: Activated ftcpt in VMM.
If you do not see these, vSphere FT may not have started
Check for XvMotion migration errors

57 vSphere FT Best Practices


Hosts running primary and secondary VMs should run at approximately the same processor frequency to
avoid errors
Homogeneous clusters work best for vSphere FT
All hosts should have
Common access to datastores used by VMs
The same virtual network configuration
The same BIOS settings (power management, hyper threading, and so on)
FT Logging networks should be configured with 10 Gb networking connections
Jumbo frames can also help performance of vSphere FT
Network configuration should be
Distribute each NIC team over two physical switches
Deterministic teaming policies to ensure network traffic affinity
ISOs should be stored on shared storage

58 Availability
vSphere Distributed Resource Scheduler

59 DRS Troubleshooting
DRS uses a proprietary algorithm to assess and determine resource usage and to determine which hosts to
balance VMs to
DRS primarily uses vMotion to facilitate movements
Troubleshooting failures generally consist of figuring out why vMotion failed, and not DRS itself as the
algorithm just follows resource utilization
Ensure the following
vSphere vMotion is enabled and configured
The migration aggressiveness is set appropriately
Fully automated if approvals are not needed for migrations
To test DRS, from the vSphere Web Client, select the Run DRS option, which will initiate recommendations
Failures can be assessed and corrected at that time
60 DRS Best Practices
Hosts should be as homogeneous as possible to ensure predictability of DRS placements
vSphere vMotion should be compatible for all hosts or DRS will not function
The more hosts available, the better DRS functions because there are more options for available placement of
VMs
VMs that have a smaller CPU/RAM footprint provide more opportunities for placement across hosts
DRS Automatic mode should be used to take full benefit of DRS
Idle VMs can affect DRS placement decisions
DRS affinity should be used to keep VMs apart, such as in the case of a load balanced configuration providing
high availability

61 Content Library
62 Content Library Troubleshooting
The Content Library is easy to troubleshoot because there are two basic areas to examine
Creation/administration of Content Libraries
This area consists of issues with the Content Library creation, storage backing, creation of and synchronizing
Content Library items, and subscription problems.
Log files are cls-debug.log / cls-cis-debug.log
They are located in /var/log/vmware/vdcs/ OR C:/ProgramData/Vmware/CIS/logs/vdcs
Synchronization of Content Libraries
This area consists of issues where there are synchronization failures and problems with adding items to a
content library. You can also track transfer session ids between cls-debug and ts-debug.
Log files are ts-debug.log / ts-cis-debug.log
They are located in /var/log/vmware/vdcs/ OR C:/ProgramData/Vmware/CIS/logs/vdcs

63 Content Library Troubleshooting – Logging (Modifying Level)


VCSA: /usr/lib/vmware-vdcs/vdcserver/webapps/cls/WEB-INF/log4j.properties
Windows: %ProgramFiles%\VMware\vCenter Server\vdcs\vdcserver\webapps\cls\WEB-INF\log4j.properties
Modifying Log Level
Make a backup before editing
Locate the required entries and modify
log4j.logger.xxx / log4j.appender.xxx
Modify the level (OFF / FATAL / ERROR / WARN / INFO / DEBUG / TRACE)
Restart vmware-vdcs service
Log4j.properties is used for configuring the logging for these Java services.
Follow the steps as defined on the screen.
Notice that Content Library (CLS) is highlighted in the paths as this can be replaced Transfer Service (TS) so
you can configure logging for those.
Note that restarting the vmware-vdcs will restart Virtual Data center / Content Library and Transfer Service.
[EXTRA INFORMATION]
================
Loggers are logical log file names. They are the names that are known to the Java application. Each logger is
independently configurable as to what level of logging (FATAL, ERROR, etc.) it currently logs. In early versions
of log4j, these were called category and priority, but now they're called logger and level, respectively.
The actual outputs are done by Appenders. There are numerous Appenders available, with descriptive names,
such as FileAppender, ConsoleAppender, SocketAppender, SyslogAppender, NTEventLogAppender and even
SMTPAppender. Multiple Appenders can be attached to any Logger, so it's possible to log the same
information to multiple outputs; for example to a file locally and to a socket listener on another computer.
Level Description
OFF The highest possible rank and is intended to turn off logging.
FATAL Severe errors that cause premature termination. Expect these to be immediately visible on a status
console.
ERROR Other runtime errors or unexpected conditions. Expect these to be immediately visible on a status
console.
WARN Use of deprecated APIs, poor use of API, 'almost' errors, other runtime situations that are undesirable
or unexpected, but not necessarily "wrong". Expect these to be immediately visible on a status console.
INFO Interesting runtime events (startup/shutdown). Expect these to be immediately visible on a console, so
be conservative and keep to a minimum.
DEBUG Detailed information on the flow through the system. Expect these to be written to logs only.
TRACE Most detailed information. Expect these to be written to logs only. Since version [5]

64 Content Library Troubleshooting – Advanced Settings


Modifying Advanced
Administration > System Configuration > Services
Changes take immediate effect
Advanced settings for Content Library and Transfer service appear in the same location

65 Content Library Troubleshooting


Common problems with the Content Library
User permissions incorrect between the two vCenter Servers, which is accomplished with Global permissions
from the vSphere Web Client
Password protected content libraries can cause authentication failures when trying to connect to them
Sufficient space available on the subscriber can cause errors when trying to synchronize content libraries

66 Content Library Best Practices


Ensure that there is enough available space on the subscriber to be able to download the content library
Ensure that the synchronization occurs off hours if utilization of bandwidth is a concern

67 VMware Certificate Authority


68 VMware CA – Management Tools
A set of CLIs allows management of VMware CA, VMware Endpoint Certificate Store, and VMware Directory
Service are available
certool
Use to generate private keys, public keys
Use to request a certificate
Used to promote a plain Certificate Server to a Root CA
dir-cli
Use to create/delete/list/manage solution users in VMDirectory
vecs-cli
Use to create/delete/list/manage key stores in VMware Endpoint Certificate Store
Use to create/delete/list/manage private keys and certificates in the key stores
Use to manage the permissions on the key stores

69 VMware CA – Management Tools (cont.)


By default, the tools are in the following locations
Platform
Location
Windows
C:\Program Files\VMware\vCenter Server\vmafdd\vecs-cli.exe
C:\Program Files\VMware\vCenter Server\vmafdd\dir-cli.exe
C:\Program Files\VMware\vCenter Server\vmca\certool.exe
Linux
/usr/lib/vmware-vmafd/bin/vecs-cli
/usr/lib/vmware-vmafd/bin/dir-cli
/usr/lib/vmware-vmca/bin/certool

70 certool Configuration File


certool uses a configuration file called certool.cfg
override by using the --config=<file name> or --Locality=“Cork”
OS
Location
VCSA
/usr/lib/vmware-vmca/share/config
Windows
C:\Program Files\VMware\vCenter Server\vmcad
certool.cfg
Country = US
Name= cert
Organization = VMware
OrgUnit = Support
State = California
Locality = Palo Alto
IPAddress =
=
Hostname = machine.vmware.com
Note: When using the “—Locality” switch to override the Locality information from the certool.cfg file, the
keyword “—Locality” must be capitalized.
Other switches (see the next slide) are lowercase switches.

71 Machine SSL Certificates


The SSL certificates for each node, also called machine certificates, are used to establish a socket that allows
secure communication. For example, using HTTPS or LDAPS
During installation, VMware CA provisions each machine (vCenter / ESXi) with an SSL certificate
Used for secure connections to other services and for other HTTPS traffic
The machine SSL certificate is used as follows
By the reverse proxy service on each Platform Service Controller node SSL connections to individual vCenter
services always go to the reverse proxy. Traffic does not go to the services themselves
vCenter service on Management and Embedded nodes
By the VMware Directory Service on PSC and Embedded nodes
By the ESXi host for all secure connections

72 Solution User Certificates


Solution user certificate are used for authentication to vCenter Single Sign-On
Issues the SAML tokens that allow services and other users to authenticate
Each solution user must be authenticated to vCenter Single Sign-On
A solution user encapsulates several services and uses the certificates to authenticate with vCenter Single
Sign-On through SAML token exchange
The Security Assertion Markup Language (SAML) token contains group membership information so that the
SAML token could be used for authorization operations
Solution user certificates enable the solution user to use any other vCenter service that vCenter Single Sign-
On supports without authenticating

73 Certificate Deployment Options


VMware CA Certificates
You can use the certificates that VMware CA assigned to vSphere components as is
These certificates are stored in the VMware Endpoint Certificate Store on each machine
VMware CA is a Certificate Authority, but because all certificates are signed by VMware CA itself, the
certificates do not include a certificate chain
Third-Party Certificates with VMware CA
You can use third-party certificates with VMware CA
VMware CA becomes an intermediary in the certificate chain that the third-party certificate is using
VMware CA provisions vSphere components that you add to the environment with certificates that are signed
by the full chain
Administrators are responsible for replacing all certificates that are already in your environment with new
certificates

74 Certificate Deployment Options (cont.)


Third-Party Certificates without VMware CA
You can add third-party certificates as is to VMware Endpoint Certificate Store
Certificates must be stored in VMware Directory Services and VMware Endpoint Certificate Store, but VMware
CA is not included in your certificate chain
In that case, VMware CA no longer provisions new components with certificates

75 VMware CA Best Practices


Replacement of the certificates is not required to have trusted connections
VMware CA is a CA, and therefore, all certificates used by vSphere components are fully valid and trusted
certificates
Addition of the VMware CA as a trusted root certificate will allow the SSL warnings to be eliminated
Integration of VMware CA to an existing CA infrastructure should be done in secure environments
This allows the root certificate to be replaced, such that it acts as a subordinate CA to the existing
infrastructure

76 Storage
77 Storage Troubleshooting
Troubleshooting storage is a broad topic that very much depends on the type of storage in use
Consult the vendor to determine what is normal and expected for storage
In general, the following are problems that are frequently seen
Overloaded storage
Slow storage

78 Problem 1 – Overloaded Storage


Monitor the number of disk commands aborted on the host
If Disk Command Aborts > 0 for any LUN, then storage is overloaded on that LUN
What are the causes of overloaded storage?
Excessive demand is placed on the storage device
Storage is misconfigured
Check
Number of disks per LUN
RAID level of a LUN
Assignment of array cache to a LUN

79 Problem 2 – Slow Storage


For a host’s LUNs, monitor Physical Device Read Latency and Physical Device Write Latency counters
If average > 10ms or peak > 20ms for any LUN, then storage might be slow on that LUN
Or monitor the device latency (DAVG/cmd) in resxtop/esxtop.
If value > 10, this might be a problem
If value > 20, this is a problem
Three main workload factors that affect storage response time
I/O arrival rate
I/O size
I/O locality
Use the storage device’s monitoring tools to collect data to characterize the workload

80 Example 1 – Bad Disk Throughput


Good Throughput
Low Device Latency
Bad Throughput
High Device Latency (Due To Disabled Cache)
80
81 Example 2 – Virtual Machine Power On Is Slow
User complaint – Powering on a virtual machine takes longer than usual
Sometimes, powering on a virtual machine takes 5 seconds
Other times, powering on a virtual machine takes 5 minutes!
What do you check?
Check the disk metrics for the host. This is because powering on a virtual machine requires disk activity

82 Monitoring Disk Latency Using the vSphere Client


Maximum disk latencies range from 100ms to 1100ms This is very high

83 Using esxtop to Examine Slow VM Power On


Rule of thumb
GAVG/cmd > 20ms = high latency!
What does this mean?
Latency when command reaches device is high.
Latency as seen by the guest is high.
Low KAVG/cmd means command is not queuing in VMkernel

Very Large Values for DAVG/cmd and GAVG/cmd

84 Solving the Problem of Slow VM Power On


Monitor disk latencies if there is slow access to storage
The cause of the problem might not be related to virtualization
Host events show that a disk has connectivity issues This leads to high latencies

85 Storage Troubleshooting – Resolving Performance Problems


Consider the following when resolving storage performance problems
Check your hardware for proper operation and optimal configuration
Reduce the need for storage by your hosts and virtual machines
Balance the load across available storage
Understand the load being placed on storage devices
To resolve the problems of slow or overloaded storage, solutions can include the following
Verify that hardware is working properly
Configure the HBAs and RAID controllers for optimal use
Upgrade your hardware, if possible
Consider the trade-off between memory capacity and storage demand
Some applications, such as databases, cache frequently used data in memory, thus reducing storage loads
Eliminate all possible swapping to reduce the burden on the storage subsystem

86 Storage Troubleshooting – Balancing the Load


Spread I/O loads over the available paths to the storage
For disk-intensive workloads
Use enough HBAs to handle the load
If necessary, separate storage processors to separate systems

87 Storage Troubleshooting – Understanding Load


Understand the workload
Use storage array tools to capture workload statistics
Strive for complementary workloads
Mix disk-intensive with non-disk-intensive virtual machines on a datastore
Mix virtual machines with different peak access times
88 Storage Best Practices – Fibre Channel
Best practices for Fibre Channel arrays
Place only one VMFS datastore on each LUN
Do not change the path policy the system sets for you unless you understand the implications of making such
a change
Document everything. Include information about zoning, access control, storage, switch, server and FC HBA
configuration, software and firmware versions, and storage cable plan
Plan for failure
Make several copies of your topology maps. For each element, consider what happens to your SAN if the
element fails
Cross off different links, switches, HBAs and other elements to ensure you did not miss a critical failure point
in your design
Ensure that the Fibre Channel HBAs are installed in the correct slots in the host, based on slot and bus speed.
Balance PCI bus load among the available busses in the server
Become familiar with the various monitor points in your storage network, at all visibility points, including
host's performance charts, FC switch statistics, and storage performance statistics
Be cautious when changing IDs of the LUNs that have VMFS datastores being used by your ESXi host. If you
change the ID, the datastore becomes inactive and its virtual machines fail

89 Storage Best Practices – iSCSI


Best practices for iSCSI arrays
Place only one VMFS datastore on each LUN. Multiple VMFS datastores on one LUN is not recommended
Do not change the path policy the system sets for you unless you understand the implications of making such
a change
Document everything. Include information about configuration, access control, storage, switch, server and
iSCSI HBA configuration, software and firmware versions, and storage cable plan
Plan for failure
Make several copies of your topology maps. For each element, consider what happens to your SAN if the
element fails
Cross off different links, switches, HBAs, and other elements to ensure you did not miss a critical failure point
in your design
Ensure that the iSCSI HBAs are installed in the correct slots in the ESXi host, based on slot and bus speed.
Balance PCI bus load among the available busses in the server
If you need to change the default iSCSI name of your iSCSI adapter, make sure the name you enter is
worldwide unique and properly formatted. To avoid storage access problems, never assign the same iSCSI
name to different adapters, even on different hosts

90 Storage Best Practices – NFS


Best practices for NFS arrays
Make sure that NFS servers you use are listed in the VMware Hardware Compatibility List. Use the correct
version for the server firmware
When configuring NFS storage, follow the recommendations from your storage vendor
Verify that the NFS volume is exported using NFS over TCP
Verify that the NFS server exports a particular share as either NFS 3 or NFS 4.1, but does not provide both
protocol versions for the same share. This policy needs to be enforced by the server because ESXi does not
prevent mounting the same share through different NFS versions
NFS 3 and non-Kerberos NFS 4.1 do not support the delegate user functionality that enables access to NFS
volumes using nonroot credentials. Typically, this is done on the NAS servers by using the no_root_squash
option
If the underlying NFS volume, on which files are stored, is read-only, make sure that the volume is exported as
a read-only share by the NFS server, or configure it as a read-only datastore on the ESXi host. Otherwise, the
host considers the datastore to be read-write and might not be able to open the files

91 Networking
92 Networking Troubleshooting
Troubleshooting networking is very similar to physical network troubleshooting
Start by validating connectivity
Look at network statistics from esxtop as well as the physical switch
Is it a network performance problem?
Validate throughput
Is CPU load too high?
Are packets being dropped?
Is the issue limited to the virtual environment, or is it seen in the physical environment too?
One of the biggest issues that VMware has observed is dropped network packets (discussed next)

93 Network Troubleshooting – Dropped Network Packets


Network packets are queued in buffers if the
Destination is not ready to receive them (Rx)
Network is too busy to send them (Tx)
Buffers are finite in size
Virtual NIC devices buffer packets when they cannot be handled immediately
If the queue in the virtual NIC fills, packets are buffered by the virtual switch port
Packets are dropped if the virtual switch port fills

94 Example Problem 1 – Dropped Receive Packets


If a host’s droppedRx value > 0, there is a network throughput issue
Cause
Solution
High CPU utilization
Increase CPU resources provided to virtual machine
Increase the efficiency with which the virtual machine uses CPU resources
Improper guest operating system driver configuration
Tune network stack in the guest operating system
Add virtual NICs to the virtual machine and spread network load across them
94
95 Example Problem 2 – Dropped Transmit Packets
If a host’s dropped TX value > 0, there is a network throughput issue
Cause
Solution
Traffic from the set of virtual machines sharing a virtual switch exceeds the physical capabilities of the uplink
NICs or the networking infrastructure
Add uplink capacity to the virtual switch
Move some virtual machines with high network demand to a different virtual switch
Enhance the networking infrastructure
Reduce network traffic
95
96 Networking Best Practices
CPU plays a large role in performance of virtual networking. More CPUs, therefore, will generally result in
better network performance
Sharing physical NICs is good for redundancy, but it can impact other consumers if the link is overutilized.
Carefully choose the policies and how items are shared
Traffic between virtual machines on the same system does not need to go external to the host if they are on
the same virtual switch. Consider this when designing the network
Distributed vSwitches should be used whenever possible because they offer greater granularity on traffic flow
than standard vSwitches
vSphere Network and Storage I/O Control can dramatically help with contention on systems. This should be
used whenever possible
VMware Tools, and subsequently VMXNET3 drivers, should be used in all virtual machines to allow for
enhanced network capabilities
97 Questions
98 VMware vSphere 6.0 Knowledge Transfer Kit
VMware, Inc Hillview Ave Palo Alto, CA 94304
Tel: or Fax:

Download ppt "VMware vSphere® 6.0 Knowledge Transfer Kit"

 Similar presentations
© 2024 SlidePlayer.com Inc. Feedback Do Not Sell About project Search... Search
All rights reserved.
Privacy Policy My Personal SlidePlayer
Feedback Information Terms of Service

You might also like