VMware Real Time Scenario
VMware Real Time Scenario
Can you describe about your VMware Infrastructure that you are supporting?
What are the daily operations?
Hint: Interviewer wants to know weather you have live experience or not. Based on your
answer he will conclude couple of items.
Answer:
I am part of VMware team (5 to 10 members) and working as L1/L2/L3/SME level supporting
Global Customer Account. Tell him/her customer details are confidential and can’t be disclosed.
Technical details:
We have 3 vCenter servers configured for two Data Center locations of the Customer. Two of
them are production and other one is for Dev/Test purpose. Production vCenter servers having
multiple (say 5) clusters and each of them has 10+ ESXi servers.
We are supporting around 150+ ESXi servers which are running 1500+ Virtual Machines
We have 5.x versions of ESXi in the Infrastructure but for one cluster they are running at same
version. VCenter servers are configured at 5.5 U2 version.
Daily tasks:
1) VMware Health Check Report – Running the script and sending the reports to Management
Check my other post
2) Checking vCenter server console for alarms and alerts
3) Scheduling changes required for VMware tasks
4) Attend meetings with Architects
5) Working on VMware related incidents like backup failure, VM not pinging, ESXi host down,
vCenter service failure …. Etc.
Question 2: There is Virtual Machine which is not pingable/rdp and in vCenter console it was
hung. All options are grayed out at VM options. How to recover this Virtual machine?
Hint: Interviewer wants to know your ESXi command line skills to troubleshoot the
scenario
Answer:
From the symptoms it is clear that there is no option available from vCenter server except you
can see Events & Tasks to understand if any action performed before it went to hung state.
For eg: Backup jobs taking snapshot, L1/L2 Admin tried to hit multiple tasks to
Shutdown/Restart/Power-off VM
With all these symptoms let us identify the ESXi host on which VM is running and get the root
password either from your Team Lead or Tool where you can get the shared ID password.
Step 1: Determining the virtual machine’s location
Determine the host on which the virtual machine is running. This information is available in the
virtual machine’s Summary tab in VI Client. Subsequent commands will be performed on, or
remotely reference, the ESXi host where the virtual machine is running.
Step 2: Open a console session either in the ESXi Shell or Putty session to ESXi host via
Name/IP
Step 3: Get a list of running virtual machines, identified by World ID, UUID, Display Name, and
path to the .vmx configuration file
esxcli vm process list
Step 4: Power off one of the virtual machines from the list using this command:
esxcli vm process kill –type=[soft,hard,force] –world-id=WorldNumber
Three power-off methods are available – Soft is the most graceful, hard performs an immediate
shutdown, and force should be used as a last resort.
Step 5: Check the Virtual Machine process again to make sure it is no more exist
esxcli vm process list
Question 3: What are the steps that you will take when vCenter service failed to start in your
Infrastructure? It is running with 5.1 or 5.5 version.
Hint: Interviewer wants to know your troubleshooting skills and confirm weather you worked on
this issue at-least once
Answer:
Validate if each troubleshooting step below is true for your environment. Each step provides
instructions or a link to a document that helps eliminate possible causes and take corrective
action as necessary
.
Note: If you perform a corrective action in any of the following steps, attempt to restart the
VMware Virtual Center Server service.
1. Verify that the VMware Virtual Center Server service cannot be restarted. Try to restart the
service once again and check for logs for error messages
2. Verify that the configuration of the ODBC Data Source (DSN) used for connection to the
database for vCenter Server is correct.
Based on your Infrastructure – SQL/DB Server either on vCenter or on other Production SQL
Cluster
3. Verify there is enough free disk space on the vCenter Server. Also disk space on SQL DB is
running, DB configured with dynamic size, DB logs are grown .. etc
Sometimes you need to contact SQL Team who can perform advanced troubleshooting steps
4. Verify that ports 902, 80, 8080, 8433 and 443 are not being used by any other application.
If another application, such as Microsoft Internet Information Server (IIS) (also known as Web
Server (IIS) on Windows 2008 Enterprise), Routing and Remote
Access Service (RAS), World Wide Web Publishing Services (W3SVC), Windows Remote
Management service (WS-Management) or the Citrix Licensing Support service are
utilizing any of the ports, vCenter Server cannot start.
If you see an error similar to one of the following when reviewing the logs, another application may be
using the ports:
Failed to create http proxy: Resource is already in use: Listen socket: :<port>
Failed to create http proxy: An attempt was made to access a socket in a way forbidden by its access
permissions.
proxy failed on port <port>: Only one usage of each socket address (protocol/network address/port) is
normally permitted
5. Verify the health of the database server that is being used for vCenter Server. If the hard
drives are out of space, the database transaction logs are full, or
if the database is heavily fragmented, vCenter Server may not start.
Sometimes you need to contact SQL Team who can perform advanced troubleshooting steps
6. Verify the VMware VirtualCenter Service is running with the proper credentials.
vpxd.exe utility helps you to update DB credential
KB 1006482
7. Verify that critical folders exist on the vCenter Server host
8. Verify that no hardware or software changes have been made to the vCenter server that may
have caused the failure. If you have recently made any changes to the vCenter server, undo
these changes temporarily for testing purposes
9. Before launching vCenter Server, ensure that the VMwareVCMSDS service is running
10. Verify that the vpxd.exe is present in C:\Program Files\VMware\Infrastructure\VirtualCenter
Server\vpxd.exe location. If this file is not present, reinstall vCenter Server
Your troubleshooting skills will be useful to identify error messages and use Google to find
nearest solution. Logical thinking is always required.
need to create a new baseline, or edit this one to include those updates.
Choose the patch type you want to include in this baseline based on you ESX/ESXi hosts and
Let us discuss various reasons for the host disconnect which is expected by Interviewer
1) Verify that the ESXi host is in a powered on state – sounds silly but most of the people forget
to check server status via ILO/DRAC/RIB … etc
2) Verify that network connectivity exists from vCenter Server to the ESXi host with the IP and
FQDN – this is VMware Administrator common suspicious point.
3) Verify that the ESXi host can be reconnected, or if reconnecting the ESXi host resolves the
issue – Simple and sometimes it will resolve the problem
4) Verify that the ESXi host is able to respond back to vCenter Server at the correct IP address.
If vCenter Server does not receive heartbeats from the ESXi host, it goes into a not responding
state. To verify if the correct Managed IP Address is set, see Verifying the vCenter Server
Managed IP Address and ESXi 5.0 hosts are marked as Not Responding 60 seconds after being
added to vCenter Server. (Known issue)
5) ESXi/ESX host disconnects from vCenter Server after adding or connecting it to the inventory
(VMware KB2040630)
6) Verify that you can connect from vCenter Server to the ESXi host on TCP/UDP port 902. If
the host was upgraded from version 2.x and you cannot connect on port 902, then verify that you
can connect on port 905.
Use simple Telnet command for checking the ports status
7) Verify if restarting the ESXi Management Agents resolves the issue – You ran these
Answer 9: You can start with definition of what is PSOD to impress him/her and followed by
Troubleshooting steps
“A Purple Screen of Death (PSOD) is a diagnostic screen with white type on a purple background
that is displayed when the VMkernel of an ESX/ESXi host experiences a critical error, becomes
inoperative and terminates any virtual machines that are running”
You need to highlight important step to capture log file information after the PSOD occurred.
To resolve this issue, extract the log file from a vmkernel-zdump file using a command line utility
on the ESX or ESXi host. This utility differs for different versions of ESX or ESXi.
For ESXi 3.5, ESXi/ESX 4.x and ESXi 5.x, use the esxcfg-dumppart utility:# esxcfg-dumppart
-L vmkernel-zdump-filename
To extract the log file from a vmkernel-zdump file:
1. Find the vmkernel-zdump file in the /root/ or /var/core/ directory:# ls /root/vmkernel*
/var/core/vmkernel*
/var/core/vmkernel-zdump-073108.09.16.1
2. Use the vmkdump or esxcfg-dumppart utility to extract the log. For example:# vmkdump -l
/var/core/vmkernel-zdump-073108.09.16.1
created file vmkernel-log.1# esxcfg-dumppart -L /var/core/vmkernel-zdump-073108.09.16.1
created file vmkernel-log.1
3. The vmkernel-log.1 file is plain text, though may start with null characters. Focus on the end
of the log, which is similar to:
VMware ESX Server [Releasebuild-98103]
PCPU 1 locked up. Failed to ack TLB invalidate.
frame=0x3a37d98 ip=0x625e94 cr2=0x0 cr3=0x40c66000 cr4=0x16c
es=0xffffffff ds=0xffffffff fs=0xffffffff gs=0xffffffff
eax=0xffffffff ebx=0xffffffff ecx=0xffffffff edx=0xffffffff
Note: The file name created for the log in this example is vmkernel-log.1. If another file with the
same name already exists, the new file is created with the number suffix incremented.
Most of the times it will be hardware issue and you need to open a case with Hardware vendors,
in this case it is HP. Based on findings you need to replace the Hardware devices or upgrade the
firmware as suggested by Hardware vendors via ITIL Change Management process.
In some cases it may be problem with software installed on ESXi server like additional agents for
monitoring both software & hardware, additional VIBs added for Storage … etc
Finally if you want to be expert to analyze the logs on your own, then here is the good KB Article
from VMware. It’s rare that Interviewer asking about debugging this issue but he wants to check
your understanding about procedure followed in case of PSOD.
Question 10: How do you troubleshoot P2V Failure Issues in your Infrastructure? (P2V =
Physical to Virtual)
Answer: There is lot of discussion about which Physical server is good candidate for VMware
Infrastructure like Exchange, SQL or Cluster … etc
Interviewer also show interest to hear from you that, how you judge which Physical server is
good candidate for Virtualization
Answer for this point is first we need to analyze 3 months of data from any Performance reporting
tools, If you notice server utilization is 80% of CPU & Memory then most likely that Physical
server not much suitable for VMware Infrastructure.
If the Server utilization is less than 70% then you can recommend it for VMware Infrastructure.
Once the server is selected for P2V and you started the process (hope you have Pre & Post P2V
checklists) and ran into some issue. Here you can find good check list to fix P2V problems.
To eliminate permission issues, always use the local administrator account instead of a
domain account.
Note: Disable UAC for Windows Vista, Windows 7, or Windows 8 prior to converting.
To eliminate DNS problems, use IP addresses instead of host names.
Ensure that you do not choose partitions that contain any vendor specific Diagnostic
Partitions before proceeding with a conversion.
To reduce network obstructions, convert directly to an ESX host instead of vCenter Server as
the destination.
Notes: This is only an option in VMware vCenter Converter Standalone
If you are unable to convert directly to an ESX host in vCenter Server 5.0, see vCenter
Converter Standalone 5.0 errors when an ESXi 5.0 host is selected as a destination.
Check KB2012310
VMware vCenter Converter Standalone has many more options available to customize your
conversion. If you are having issues using the Converter Plug-in inside vCenter Server,
consider trying the Standalone version.
If a conversion fails using the exact size of hard disks, decrease the size of the disks by at
least 1MB. This forces VMware Converter to do a file level copy instead of a block level copy,
which can be more successful if there are errors with the volume or if there are file-locking
issues.
Make sure there is at least 500MB of free space on the machine being converted. VMware
Converter requires this space to copy data.
Shut down any unnecessary services, such as SQL, antivirus programs, and firewalls. These
services can cause issues during conversion.
Run a check disk on the volume before running a conversion as errors on disk volumes can
cause VMware Converter to fail.
Do not install VMware Tools during the conversion. Install VMware Tools after you confirm
that the conversion was successful.
Do not customize the new virtual machine before conversion.
Ensure that these services are enabled:
Workstation Service
Server Service
TCP/IP NetBIOS Helper Service
Volume Shadow Copy Service
Check that the appropriate firewall ports are opened
Check that boot.ini is not looking for a Diagnostic/Utility Partition that no longer exists.
If you are unable to see some or all of the data disks on the source system, ensure that you
are not using GPT on the disk partitions.
In Windows XP, disable Windows Simple File Sharing. This service has been known to cause
issues during conversion.
Unplug any USB, serial/parallel port devices from the source system. VMware Converter may
interpret these as additional devices, such as external hard drives which may cause the
conversion to fail.
If the source machine contains multiple drives or partitions and you are having issues failing
on certain drives, consider converting one drive or partition at a time.
Verify that there are no host NICs or network devices in the environment that have been
statically configured to be at a different speed or duplex. This includes settings on the source
operating system, switches and networking devices between the source and destination
server. If this is the case, Converter sees the C: drive but not the D: drive.
If you are using a security firewall or Stateful Packet Inspecting (SPI) firewall, check firewall
alerts and logs to make sure the connection is not being blocked as malicious traffic.
If you have static IP addresses assigned, assign the interfaces DHCP addresses prior to
conversion.
If the source server contains a hard drive or partition larger than 256GB, ensure that the
destination datastores block size is 2MB, 4MB, or 8MB, and not the default 1MB size. The
1MB default block size cannot accommodate a file larger than 256GB.
Clear any third-party software from the physical machine that could be using the Volume
Shadow Copy Service (VSS). VMware Converter relies on VSS, and other programs can
cause contention.
Disable mirrored or striped volumes. Mirrored or striped volumes cannot be converted
Verify that the VMware Converter agent is installed on the source machine. It may not be if
the conversion fails right away.
Verify that DNS and reverse DNS lookups are working. It may be necessary to make entries
into the local hosts file on source machine. Use IP addresses, if possible.
Run msconfig on the source server to reduce the number of services and applications running
at startup. Only Microsoft services and the VMware Converter Service should be running.
Inject VMware SCSI drivers into the machine before conversion. Windows tries to Plug-n-Play
the new SCSI Controller, and Windows may fail if the proper drivers are not installed.
If you customized permissions in your environment, ensure that local administrator has rights
to all files, directories, or registry permissions before conversion.
Uninstall any UPS software. This has been known to cause issues after Conversion.
Ensure that you do not have any virtual mounted media through an ILO- or DRAC-type
connection. Converter can misinterpret these as convertible drives, and fails upon detecting
them. As a precaution, disconnect your ILO or DRAC to prevent this issue.
Your answer should also cover Logs information which will prove your real time experience
VMware Converter logs
There are also several ways to diagnose issues by viewing the VMware Converter logs. The logs
can contain information that is not apparent from error messages. In newer versions of VMware
Converter, you can use the Export Log Data button. Otherwise, logs are typically stored in these
directories:
Windows NT, 2000, XP, and 2003:
C:\Documents and Settings\All Users\Application Data\VMware\VMware Converter Enterprise\
Logs
C:\WINDOWS\Temp\vmware-converter
C:\WINDOWS\Temp\vmware-temp
Windows Vista, 7, and 2008:
C:\Users\All Users\Application Data\VMware\VMware Converter Enterprise\Logs
Windows 8 and Windows 2012:
C:\ProgramData\VMware\VMware vCenter Converter Standalone\logs
Note: In order to access this location in Windows Vista, 7, or 2008, you may need to go into the
folder options and ensure that Show Hidden Files is enabled and that Hide Protected Operating
System Files is disabled.
C:\WINDOWS\Temp\vmware-converter
C:\WINDOWS\Temp\vmware-temp
Windows NT and 2000:
C:\WINNT\Temp\vmware-converter
C:\WINNT\Temp\vmware-temp
Linux:
$HOME/.vmware/VMware vCenter Converter Standalone/Logs
/var/log/vmware-vcenter-converter-standalone
Question 11: VMware VSAN is popular now a days and do you have any experience in that
one? Can you brief pre-requisites of VMware VSAN? Have you ever faced any issues with VSAN
setup? If yes how you recovered it?
Most of us not blessed to Install & Configure VMware VSAN in Customer Infrastructure but soon
you will be assigned to that role as the Storage slowly moving to VSAN world. Virtual SAN 6.0
comes with a Health Services plugin. This feature checks a range of different health aspects of
Virtual SAN, and provides insight into the root cause of many potential Virtual SAN issues. The
recommendation when triaging Virtual SAN is to begin with the Virtual SAN Health Services.
Once an issue is detected, the Health Services highlights the problem and directs administrators
to the appropriate VMware knowledge base article to begin problem solving.
Please refer to the Virtual SAN Health Services Guide for further details on how to get the
Health Services components, how to install them and how to use the feature for troubleshooting
common Virtual SAN issues. It is better to know the pre-requisites before entering into
Troubleshooting mode. Let’s review couple of pre-requisites for VMware Virtual SAN (List is
really big and try to remember only important ones)
Once you know the pre-requisites and next step is collecting Virtual SAN (VSAN) support
logs. Virtual SAN support logs are contained in a normal ESXi support bundle in the form of
VSAN traces. The VSAN support logs are gathered automatically by gathering the ESXi support
bundle for the hosts. As VSAN is distributed across multiple ESXi hosts, VMware recommend
you gather the ESXi support logs for all hosts configured for VSAN. As Virtual SAN is a software-
based storage product, it is entirely dependent on the proper functioning of its underlying
hardware components such as the network, the storage I/O controller, and the storage devices
themselves. As Virtual SAN is an enterprise storage product, it can put an unusually demanding
load on supporting components and subsystems, exposing flaws and gaps that might not be
seen with simplistic testing or other, less-demanding use cases. Indeed, most Virtual SAN
troubleshooting exercises involve determining whether or not the network is functioning properly,
or whether the Virtual SAN VMware Compatibility Guide (VCG) has been rigorously followed.
As Virtual SAN uses the network to communicate between nodes, a properly configured and fully
functioning network is essential to operations. Many Virtual SAN errors can be traced back to
things like improperly configured multicast, mismatched MTU sizes and the like. More than
simple TCP/IP connectivity is required for Virtual SAN. Virtual SAN uses server-based storage
components to recreate the functions normally found in enterprise storage arrays. This
architectural approach demands a rigorous discipline in sourcing and maintaining the correct
storage I/O controllers, disks, flash devices, device drivers and firmware, as documented in the
VMware Virtual SAN VCG.
Finally When you attempt to configure or deploy VMware Virtual SAN (VSAN), you experience
these symptoms:
VSAN configuration fails
In the vSphere Web Client, you see an error similar to:
Found another host participating in the VSAN service which is not a
member of this host’s vCenter cluster
Resolution
To resolve this issue, check all ESXi host nodes in the VSAN cluster and ensure that their
configuration is correct and all VSAN hosts are in the same network subnet.
Question 12: Have you ever fixed metadata issues with ESXi data stores? Can you list
commands or Tools used for such activity? Is there any pre-requisite checklist before you use
such tools? Help me to understand the troubleshooting procedure for this scenario.
Answer: Once Interviewer got confidence at your basics of VMware, he/she will start asking
questions in advanced troubleshooting or topics. You need to answer such questions wisely by
adding Customer scenario along with answer. Let me help you to answer it properly with this
post. You can start the answer like recently one of Customer had SAN outage which resulted
couple of data stores not coming online which resulted Major Incident/Priority One ticket. I
worked with VMware support team and used VOMA (vSphere On-disk Metadata Analyzer) tool to
check VMFS metadata consistency.This utility scans the VMFS volume metadata and highlights
any inconsistencies to which you need to work with SAN team to fix it faster in real time
scenarios.
Since ESXi 5.x it is possible to check VMFS for metadata inconsistency with a tool called VOMA
(VMware Ondisk Metadata Analyser). With VOMA you can check VMFS3 and VMFS5
datastores. Please note, that the tool can only identify problems, as it runs in a read-only mode.
So it does not help you to fix detected errors.
Reasons to use VOMA:
occurrence of metadata errors in the vmkernel log
If you experience SAN outage
After Rebuilt RAID
Disk replacement
Partition table update
Reports of metadata errors in the vmkernel.log file
Unable to access files on the VMFS volume that are not in use by any other host
if you cannot modify, erase or access files on a VMFS datastore, that is not in use by another
host
Before you start VOMA from the CLI of your ESXi host, take care of the following guidelines:
Shut down all virtual machines running on the VMFS datastore (or migrate them)
make sure that the VMFS volume is not in use by other hosts (best practice: unmount the
datastore on the other hosts)
make sure that the datastore is not in use by vSphere HA for heartbeating
make sure that the datastore is not in use by other features like Storage I/O control,…
make sure that the volume is not a multi-extent volume
Now log on to your ESXi host and let’s take a look at the available parameters of VOMA (voma -
h)
Procedure:
To perform a VOMA check on a VMFS datastore and send the results to a specific log file, the
command syntax is:
voma -m vmfs -d /vmfs/devices/disks/naa.00000000000000000000000000:1 -s /tmp/analysis.txt
where naa.00000000000000000000000000:1 is replaced with the LUN NAA ID and partition to
be checked. Note the “:1” at the end.
This is the partition number containing the datastore and must be specified. See note below. As
an advisory, if you run voma more than once, add the NAA ID and a time stamp to the output log
file name. EG: -s /tmp/naa.00000000000000000000000000:1_analysis_<<hhmm>>.txt
Note: VOMA must be run against the partition and not the device. If VOMA is run against a
device, it produces an error similar to:
Error: Missing LVM Magic. Disk doesn’t have a valid LVM Device
Error: Failed to Initialize LVM Metadata
Question 13: As part of Data Center Network devices upgrade/change – someone changed
vCenter IP Address. How do you tackle this Scenario as a VMware Administrator? What is the
Technical plan that you will follow for this Change Record? (ITIL Process)
Answer:
Hint: Interviewer looking at your Technical direction/plan along with ITIL Chanage management
procedures
We may think that changing IP Address is easy job like going to vCenter VM Console (most of
the cases) [OR] Remote console for Physical servers and modify the Network Adapter Settings.
But what happens to your ESXi servers, NSX VM’s and Update Manager?? will they
communicate to your vCenter server with new IP directly without any modification? Here is the
detailed Technical Plan to answer this question.
Create backups of the vCenter Server & underlying SQL database for Backup Plan
Set DRS to manual mode to avoid anything moving around
Identify the ESXi host running the vCenter VM and connected directly to the host with the
vSphere Client – Do not forget your vCenter going to disconnect and you can’t manage it
anymore via vSphere client
Close any sessions you have open to the vCenter Server (Web Client, vSphere Client
sessions
Open a console window to the vCenter Server by way of the ESXi host.
Stop all VMware related services
Change the IPv4 address and IPv4 gateway as per new Networking configuration
Put DRS back to fully automated (optional based on your setup)
Uninstall Update Manager software from the VM (Some times it’s installed other than
vCenter)
Install Update Manager and point it new vCenter Server IP Address
NOTE: There is easy method to update vCenter IP Address at Update Manager via
command line (we will discuss it in future posts)
Update the vCenter Managed IP Address with below procedure
NSX requires your attention as vCenter re-registration is complex procedure – leave this for
Network Specialists to provide technical plan
Disconnect host from vCenter to flush out the database entry
Reconnect to the ESXi host to use new vCenter IP Address for communication and agents
Installation
Finally I tried to bring most of the related items for vCenter IP Address change from my
Experience and knowledge but do not treat this as final Technical Plan. You need to refer your
Infrastructure for better planning Change Records (CR’s) as per ITIL Procedure.
Question 14: Have you ever fixed vSphere ESX Agent Manager (EAM) failure issues with NSX
Solution? What do you think about below error message?
In the /var/log/netcpa.log file on the ESXi host, you see entries similar to:
Netcpa error: error netcpa[FFCFFB70] [Originator@6876 sub=Default]
Failed to get host id, returned , len 0
Answer: NSX became core Network solution for most of Cloud Companies. IBM Soft Layer –
VMware became Strategic Partners to deliver VMware solutions for Soft Layer customers. You
can find more details here. Interviewer wants to check your Host preparation skills for NSX usage
in this question. Host preparation involve couple of steps including installing VIB modules for esx-
dvfilter-switch-security, esx-vsip & esx-vxlan.
Host preparation is the process in which the NSX Manager:
1) Installs NSX kernel modules on ESXi hosts that are members of vCenter clusters
2) Builds the NSX control-plane and management-plane fabric
NSX kernel modules packaged in VIB files run within the hypervisor kernel and provide services
such as distributed routing, distributed firewall, and VXLAN bridging capabilities. To prepare your
environment for network virtualization, you must install network infrastructure components on a
per-cluster level for each vCenter server where needed. This deploys the required software on all
hosts in the cluster. When a new host is added to this cluster, the required software is
automatically installed on the newly added host. vSphere ESX Agent Manager automates the
process of deploying and managing vSphere ESX agents. Once you are clear with basics then
you can explain the possible reasons to troubleshoot VIB’s deployment failure using VMware
ESX Agent Manager (EAM), Installation Status showing Not Ready on the Clusters and Hosts.
If you can speak or explain 4 or more steps to the Interviewer, then he/she will be happy to
Question 15: Can you explain me any major issue that you fixed in VMware Infrastructure?
How you handle that situation? Can you list the steps that you followed to fix that problem?
Answer: It’s long back that I wrote the post for real time scenarios and today going to help you
guys with one of common Interview question for this “Thanks Giving Day”. It’s always good to
pick Storage related issues which creates high impact for both Physical & VMware
Infrastructures. I’m going to take the scenario of couple of VMs’ are not responding including
vCenter Server which is also running as Virtual Machine.
You can start the narration like – you got a call from Helpdesk about some VM’s are inaccessible
in the vCenter server. When you connected to the vCenter server, you can’t access the console
of VM’s as they stuck at black screen. From the screenshot it’s clear that, CPU & Memory usage
is high for couple of hosts in the Cluster. You want to understand the reason for CPU & Memory
utilization on the ESXi hosts and ran the “esxtop” command to know which VM/process creating
more CPU/Memory utilization. If you are new to esxtop, then follow this article to know the usage
of this command. As the issue is related to couple of virtual machine’s only then you downloaded
vmware.log file to understand more about this issue from VM’s point of view. You notice that
there is some latency to download the log files. While this issue is running, vCenter VM went to
unresponsive state which makes the situation more worse. From the attempt of esxtop &
vwware.log there is not much information to extract and management wants you to fix the
vCenter VM issue first. Also SQL DB server also running as Virtual Machine which is holding
vCenter & Update Manager DB’s. Also you installed PernixData (acquired by Nutanix) to improve
storage I/O Performance for VM’s. These factors make the situation complex and below diagram
helps you to understand the scenario in better way.
Now let’s talk about resolution step by step and how you found the issue with Storage. Help desk
reported that another set of VM’s are not reachable in the network and they are hung at black
screen when they opened the VM console from making direct connection to ESXi server. It
means issue is really growing and impacting more business functions like SharePoint, Exchange,
Citrix, Finance Applications … etc. Management declared this as MI (Major Incident) and
requested you to join the bridge call along with other Technical teams like Storage, Network &
Incident escalation managers. This is going to be collaboration effort rather than you fix this issue
alone from VMware team. As there was latency to download the file and black screen for the VM
console brings your attention to Storage related problems. esxtop has specific switches to
understand storage related problems and check this article to know more details. For this
escalation you need to know KAVG & DAVG values and thresholds as listed below.
So from your troubleshooting it’s clear that there is ongoing Storage related problem in the
Infrastructure which shows in esxtop and VM consoles are with black screen. You requested
Storage team to validate the VNX configuration and performance to make sure they are also
seeing any alerts or issues from their side. They came back quickly with an update that ALL stats
are looking good for read & write operations. Management asking what is the problem and you
are confident that there is some Storage issue but they can’t see in VNX side. So you requested
the Storage team to give some recommendations and they comeback with a plan that, they are
going to try to switch the Storage Processor from A to B. They executed this plan with
management approval but there is no change in the situation. Later they want to try to reboot
storage processors one by one but it has major impact if they can’t come online after reboot.
Management agreed as ALL the production VM’s are down at that time. Storage team finished
rebooting both processors successfully in 1 hour but that didn’t change anything in the VM’s
status. Which means Storage team performed all the Troubleshooting including rebooting the
SPA & SPB. Now management wants to know next action plan from VMware Infrastructure.
You are confident that there is Storage problem but VNX looks good then there should be a
problem with another Storage layer which is Pernixdata. Upon investigating more from
Pernixdata process – you found that there are lot of pending I/O jobs waiting at one of ESXi host
server. You killed the pernix process on that specific host which brought the services online to
normal state. One the VM’s became normal, you recommended the Management to perform
clean ESXi reboot to avoid any immediate potential outage to the Production. They agreed for it
and clean reboot performed for each ESXi host and conference call got closed after validating
the Applications from the end users. (UAT – User Acceptance Test)
Resolution: Pernixdata helps to improve the Storage I/O performance by offloading that
service to ESXi host local cache, however in this situation it’s holding lot of jobs which created
above situation. By killing pernix process from ESXi host console and clean ESXi reboot brought
the situation to normal state. Hope this helps you to answer Interview questions in better way and
be social to share the knowledge.
Question 16: Can you explain the options that you see when you format new Data store in
vSphere 6.5 via Web client?
Answer: vSphere 6.5 released some time ago and Interview panel members now started asking
for questions related to 6.5 like 16differences and what’s new in 6.5. Today let me help you to
understand the Storage part. In real-time, there are couple of scenarios like new VM’s build
project, migrating to new Storage Array, Increasing current VM disk file size requires additional
storage in VMware Cluster. VMware administrator is responsible to make sure that he/she
understand the over-commit challenges with Data store. For example if we have 2 TB of Data
store and you already allocated 1.8 TB of space, which means you consider the thin disk vs thick
disk precaution before assigning the leftover 200 GB storage space. To keep it simple – thumb
rule for VMware administrator is not to allocate more space than available in the Data store for
Production Infrastructure.
Also you need to understand the performance challenges with THIN on THIN vs THICK on
THICK options selected by Storage and VMware Administrators. Do not forget Storage
administrator also can create THIN disks to allocate the storage for VMware Cluster. Let’s talk
more options available while you format the Data store at VMware cluster, after Storage
administrator confirmed that LUN allocation to all ESXi servers. First we will see the standard
steps performed by VMware Administrator for VMFS type LUN which has options to format with
version 5 or 6 followed by NFS & VVOL based storage. You can find the differences between
these version in my previous post “VMFS 5 vs VMFS 6“. Previous versions won’t show VVol
option and Interviewer expectation is that you will highlight that point and describe it’s
functionality. VMFS 6 enables advanced format (512e) and automatic space reclamation support.
You can mention the options – Space Reclamation Priority and Granularity which are not
available in previous versions. You can also mention NFS versions 3 & 4.1 which makes your
answer in real time format. Interviewer is keen about new options available or enabled with
vSphere 6.5 and make sure you mention all these in your answer to make better impression.
VMFS LUN
NFS LUN
VVOL Based Storage
If you are new to VVOL term, here is the quick explanation about it. Virtual Volumes (VVols) is a
new integration and management framework that virtualizes SAN/NAS arrays, enabling a more
efficient operational model that is optimized for virtualized environments and centered on the
application instead of the infrastructure. Virtual Volumes simplifies operations through policy-
driven automation that enables more agile storage consumption for virtual machines and
dynamic adjustments in real time, when they are needed. It simplifies the delivery of storage
service levels to individual applications by providing finer control of hardware resources and
native array-based data services that can be instantiated with virtual machine granularity. With
Virtual Volumes (VVols), VMware offers a new paradigm in which an individual virtual machine
and its disks, rather than a LUN, becomes a unit of storage management for a storage
system.Virtual volumes encapsulate virtual disks and other virtual machine files, and natively
store the files on the storage system.
Question 17: Can you draw on the whiteboard/paper for high level architecture design that
details a cluster of 4 ESXi hosts, connected to a vCenter Server that has 150+ VM’s balanced
across the hosts.There are 2 network switches vSS-1 & vDS-2 and ESXi servers are
connected to for VM & management traffic via VLAN’s .The hosts are connected to four 500
GB shared storage LUN’s that are presented via fiber channel.
Answer: If you answer theory questions correctly then Interviewer is going to check your
understanding of vSphere Architecture skills. This is tough question as it involves lot of
factors to explain specific answer. However I will try to cover my answer with some
assumptions to keep it simpler. When I cleared my Data center design exam, there are lot of
these examples, we prepared in the Google+ community and shared with other members.
There are two phases for any design like Logical & Physical diagrams, where logical talks
about high level design but won’t cover any specific product details. Physical diagram will
cover most of product specific details like Cisco UCS, storage and software versions.
What Is a Technical Design?
A technical design is a way of communicating an end product or solution. By creating a
technical design, a group of people can work together to create a final solution. A design
methodology is a multi phase process used to direct a technical design process which
involves below steps.
Iterative
Involves other people
Helpful and necessary to the success of a project
Run the command: vsan.disks_info ~/computers/Clustername/hostname
/10.30.40.61/CICD-SDDC/computers> vsan.disks_info ~/computers/vDC-NSX/hosts/10.30.40.51
2017-05-16 14:52:55 -0700: Gathering disk information for host 10.30.40.51
2017-05-16 14:53:08 -0700: Done gathering disk information
Disks on host 10.30.40.51:
+——————————————————–+——-+———+
—————————————————————————————-+
| DisplayName | isSSD | Size |
State |
+——————————————————–+——-+———+
—————————————————————————————-+
| Local ATA Disk (naa.55cd2e404c09019c) | SSD | 223 GB |
eligible |
| ATA INTEL SSDSC2BB24 | | |
|
+——————————————————–+——-+———+
—————————————————————————————-+
| Local ATA Disk (naa.55cd2e404c090c3b) | SSD | 223 GB |
ineligible (Existing partitions found on disk ‘naa.55cd2e404c090c3b’.) |
| ATA INTEL SSDSC2BB24 | | |
|
| | | | Partition
table: |
| | | | 1: 223.57 GB, type = vmfs
(’51_SSD-Datastore’) |
+——————————————————–+——-+———+
—————————————————————————————-+
From the output you can clearly see that there is ineligible disk as there is existing partition
found on that disk. Which means this disk has previous file system and failed to detect by VSAN
as it will look for disks without any file system/partition. Now it’s time to delete old partition on the
disk and make it usable for VSAN operations. When you try to de-commission the VSAN cluster,
you may get same challenge as file system/ disk partitions are not deleted during the VSAN
disable process from vCenter server. Now you have proper real time/ production scenario to
explain it to the Interviewer. Start your answer with the number of ESXi servers in the cluster
going to be part of VSAN operations, number of disks used and FTT values.
Summary: When you answer this question, you will explain about VSAN Infrastructure
planning/sizing decisions taken by Architect. You are responsible to implement the solution and
come across the ineligible disks issue during the deployment. RVC tool is used to find the
problem and later partedUtil is used to delete the existing partition from the disk. Finally local
disks are visible for VSAN enablement and VSAN Datastore is created successfully.
Question 19: Have you installed vCenter 6.5 in HA mode? ( new feature) How do you
troubleshoot if there is any issue with vCenter availability? What happens if the failover process
is not executed in HA? Can you access the vCenter server with regular IP address?
Answer: It’s long time that I posted scenario based question. You may not get this much
detailed question from the Interviewer but I tried to cover some aspects of question can be
asked. Having said that, VMware announced this vCenter HA feature from vSphere 6.5 with
there vCenter server standard license. So no additional cost involved for this feature &
thanks for not making licensing model more complex topic. This feature is exclusively available
for the vCenter Server Appliance (VCSA). When vCenter HA is enabled, a three-node vCenter
Server cluster (Active, Passive, and Witness nodes) is deployed. vCenter HA provides an RTO of
about 5 minutes for vCenter Server greatly reducing the impact of host, hardware, and
application failures with automatic failover between the Active and Passive nodes.
vCenter HA can also be enabled, disabled, or destroyed at any time allowing customers to easily
take advantage of this new capability. There is also a maintenance mode that prevents planned
maintenance from causing an unwanted failover. It’s lot of theory and here is the famous vCenter
HA diagram. Do not get confused with cluster HA feature which is there from long time. It’s
always tricky with HA feature as it creates additional machines. Let’s focus about our explanation
towards vCenter HA crashed.
You can begin the story with the production scenario – our vCenter server is not responding even
it’s configured in HA mode. Upon investigation we notice that vCenter is not in network and can’t
manage it remotely. When the connection made to ESXi server directly to see what’s running in
vCenter console it’s up and running but not reachable in the network. When we ran ifconfig
command at the console – there is only private IP but management IP is not visible. It sounds
like vCenter is still able to make heartbeat connections with it’s witness node and there is no
vCenter failover happened. So you have production outage as vCenter is not available and any
software depends on vCenter functionality is going to be affected like backups,disaster recovery
… etc. It’s good explanation about the problem and let’s cover the recovery steps required for
this problem.
If all nodes in a vCenter HA cluster cannot communicate with each other, the Active node stops
serving client requests.
Problem
Node isolation is a network connectivity problem.
Procedure
1. Attempt to resolve the connectivity problem. If you can restore connectivity, isolated nodes
rejoin the cluster automatically and the Active node starts serving client requests.
2. If you cannot resolve the connectivity problem, you have to log in to Active node’s console
directly.
Power off and delete the Passive node and the Witness node virtual machines.
Log in to the Active node by using SSH or through the Virtual Machine Console.
To enable the Bash shell, enter shell at the appliancesh prompt.
Run the following command to remove the vCenter HA configuration.
destroy-vcha -f
Reboot the Active node. The Active node is now a standalone vCenter Server
Appliance.
Perform vCenter HA cluster configuration again.
These steps helps you to bring your vCenter back online into the management network and tip
here is you need to use VMRC to correct network card status to connected this is not listed
in VMware official document right. I found that vCenter VM has two network adapters but only
heartbeat network card seems connected but management is not. I tried to run some linux based
command to bring it online but trick is it’s in edit settings. I need to look for same setting via
HTML client for ESXi but for now you can use VMRC console as your answer. That’s it for this
post and will publish more troubleshooting scenarios to share my knowledge.
Question 20: Do you know the purpose & use case of vRA? Have you ever troubleshooted VM
provision failure issues in vRA? Let’s assume there are two flavors of Desktop Win 7 & Win 10
with couple of software’s like Chrome, Mozilla, IE & Java. Can you help me to explain the
troubleshooting steps which you follow for the VM provision failure problem with given
software’s?
Answer: As the focus shifted from VMware operations to automation, most of the Interviewers
asking questions related to vRealize Automation with the primary use case of automated
provision failures when requested from the user portal. Let’s recap how to define this vRealize in
VMware words – vRealize Automation enables IT Automation through the creation and
management of personalized infrastructure, application and custom IT services (XaaS). This IT
Automation lets you deploy IT services rapidly across a multi-vendor, multi-cloud infrastructure.
Let me introduce some known errors from the vRA console:
Let’s start the explanation with the scenario like – You got the call from DevOps team as they
failed to provision the VM’s from the self service portal. Assumption for this case is – vRA
deployed in minimal setup with 3 servers, Tenants, Blueprints, catalogs and necessary
configurations are pre-configured. You use blueprints to define machine deployment seĴingsǯ
Published blueprints become catalog items, and are the means by which entitled users provision
machine deployments. vRealize Automation provides a secure portal where authorized
administrators, developers, or business users can request new IT services and manage specific
cloud and IT resources, while ensuring compliance with business policies. Requests for IT
services, including infrastructure, applications, desktops, and many others, are processed
through a common service catalog to provide a consistent user experience.
What next – you want to understand more about the problem, so first check with catalog item has
a problem and capture the details of vCenter server & Template name associated with it. You
can connect to the vCenter server and validate for any error messages for VM clone and guest
customization errors (if you are using them from vCenter server). From the vRA portal, you can
request the VM and monitor the provision process to troubleshoot further. Make sure there are
no communication problems within vRA components – SQL Server Database, DEM (Distributed
Execution Manager), IaaS & Web server components. Once it’s confirmed that vRA components
are functional, then you need to review the logs on Guest VM deployed. You need to have local
administrator or darwin user account to perform further troubleshooting steps for this scenario.
You need to start with GuestAgent.log file to see if there any connection failures. We speak a lot
about the overall process but please note vRA product has many areas to look at before you
conclude the problem area. So let me conclude the answer for the interview as – first error
message occurred as the new ESXi host joined recently missing particular VLAN 150, which is
required for VM to communicate with IaaS server. To fix the issue we need to work with Network
team to trunk/enable VLAN 150 for this ESXi server Physical NIC’s. Second error message
which I mentioned earlier caused by darwin account has an expired password as the Company
password policy requires all the user account passwords require to change the password in 90
days or get an exception fpr special services accounts. In this case darwin is not treated as
service account as this one is local to the specific VM provisioned. If you can explain 50% of the
content which I explained in this blog post, your Interviewer can be convinced that you have
some knowledge in vRA product. If you want to talk more about key features of this product, here
is the required information.
Deliver a Self-Service Experience
• Unified IT service catalog – deliver infrastructure, container, application and custom services
(XaaS).
• Policy-based governance ensures the right service level to meet specific business needs.
• Automation accelerates IT service delivery.
Unified Blueprint Model via Design Canvas, Command Line or API
• Streamline IT service design process by assembling applications from prebuilt components
using a visual canvas with a drag and drop interface.
• Blueprints as Code – Export, import and edit automation blueprints as text.
• API – enable the entire design and management process via API calls.
• Leverage VMware and partner-provided blueprints in VMware Cloud Management Marketplace.
Deploy Across Multi-Vendor, Hybrid Cloud Infrastructure
• Flexibility to choose the right cloud platform and location that meets the business needs.
• Consistent governance and control across hybrid cloud deployments.
Accelerate Time to Value via an Extensible Automation Platform
• Extensible platform that enables customization and extensibility at multiple levels across IT
ecosystem.
• Design and automate the delivery of any IT services (XaaS) through service orchestration.
• Leverage VMware and partner-provided integration solutions in VMware Cloud Management
Marketplace.
Question 20: Do you know the recent security patches released for Intel Spectre and
Meltdown? Have you patched ESXi servers (Cluster/standalone/DMZ/Lab)? Can you share
couple of issues you ran into while patching those hosts?
Answer: This is hot topic for many Administrators not limited to VMware to start the new year
2018. Some background for “Meltdown and Spectre” which are critical vulnerabilities existing in
several modern CPU: these hardware bugs allow programs to steal data which is currently
processed on the computer. Meltdown and Spectre can affect personal computers, mobile
devices, server and several cloud services.
Actually, the only way to minimize those security risks is to patch your operating systems or the
hypervisor level (if you are using virtual machines) and here is the latest VMware KB
Article about these patches.
These kind of questions from the Interviewer to check your awareness to active issues in the IT
Infrastructure space and followed by troubleshoot depth based on the scenario you picked to
explain the answer. Let me share couple of good scenarios
ESXUPDATE Log file
2018-01-11T16:24:39Z esxupdate: 76791: downloader: DEBUG: Downloading
http://vumserver:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip to
/tmp/tmp4m_ng4…
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: An esxupdate error exception
was caught:
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: Traceback (most recent call
last):
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: File “/usr/sbin/esxupdate”, line
238, in main
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: cmd.Run()
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: File “/build/mts/release/bora-
5224934/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/
esx5update/Cmdline.py”, line 105, in Run
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: File “/build/mts/release/bora-
5224934/bora/build/esx/release/vmvisor/sys-boot/lib/python2.7/site-packages/vmware/
esximage/Transaction.py”, line 73, in DownloadMetadatas
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: ERROR: MetadataDownloadError:
(‘http://vumserver:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip’, None,
“(‘http://vumserver:9084/vum/repository/hostupdate/vmw/vmw-ESXi-6.0.0-metadata.zip’,
‘/tmp/tmp4m_ng4’, ‘[Errno 4] IOError: [Errno 104] Connection reset by peer’)”)
2018-01-11T16:24:39Z esxupdate: 76791: esxupdate: DEBUG: <<<
You can start the explanation with there is a remote branch office which needs to be patched by
shutting down VM’s (as your design has single ESXi host) but failed to apply the required
patches with the above error message. Log file and symptoms are giving the indication about
communication failure between VMware update manager server and ESXi host. Then you tried
to copy the patches manually to the data store and tried to apply them but failed. As this is
problem seems to be complex, then you opened a case with VMware support to resolve it.
VMware support requested for log bundle from the ESXi server and Update manager server to
analyze this issue. They found issues with patch repository which is resolved by renaming the file
– “D:\ProgramData\VMware\VMware Update Manager\Data\hostupdate\vmw\vmw-ESXi-6.0.0-
metadata.zip” and followed by downloading metadata from the patch source.
Scenario 2: Failed to migrate VM’s from ESXi host while trying to keep it
in Maintenance Mode
Migrate virtual machine:
The vMotion failed because the destination host did not receive data from the source host on the
vMotion network. Please check your vMotion network settings and physical network
configuration and ensure they are correct
This scenario is kind of V-Motion issue stopping your patching activity as this is only way to
perform live migration of running virtual machines and keep the ESXi maintenance mode. You
started checking the V-Motion related settings like VMKernel, IP address and port group settings
but all of them looks good. You tried to perform vmkping command from source to destination
host but it failed. It seems there is communication issue for the V-Motion network and VLAN
associated with it. When you engaged Network team to validate the connectivity, they confirm no
firewall rule changes and other IP’s are able to communicate in the same subnet. This scenario
forced you to check other settings. Upon more investigation you found that these are HP Blade
servers and V-Motion network is configured as Internal network which doesn’t have an uplink
associated. HP confirmed it’s known issue with the current Openview/c7000/Flex backend
network settings. After contacting HP support they suggested to move the V-Motion network to
another port group where communication is not broken. This helped you to migrate all the VM’s
to other ESXi servers in the cluster to complete the patching activity.
Question 22: What is HCI (Hyper-Converged Infrastructure) and how to manage VMware
Infrastructure when there is HCI solution deployed like Nutanix (for this scenario). What are the
commands to run against the cluster to check the health status? Have you ever run the upgrades
for Control VM (AOS), Prism Central (PC – like vCenter), BMC/BIOS & Hypervisor? Can you
explain the responsibilities to configure/manage the Nutanix based clusters?
Answer: HCI aka Hyper-Converged Infrastructure became a turn-key solution and migrations
from Traditional Infrastructure to HCI based solutions. Legacy infrastructure—with separate
storage, storage networks, and servers—is not well suited to meet the growing demands of
enterprise applications or the fast pace of modern business. The silos created by traditional
infrastructure have become a barrier to change and progress, adding complexity to every step,
from ordering to deployment to management. New business initiatives require buy-in from
multiple teams, and IT needs must be predicted 3-to-5 years in advance. As most IT teams
know, this involves a substantial amount of guesswork and is almost impossible to get right.
Hyper-converged infrastructure combines common datacenter hardware using locally attached
storage resources with intelligent software to create flexible building blocks that replace legacy
infrastructure consisting of separate servers, storage networks, and storage arrays. Nutanix
converges the entire data center stack, including compute, storage, storage networking, and
virtualization. Complex and expensive legacy infrastructure is replaced by Nutanix Enterprise
Cloud OS running on state-of-the-art, industry-standard servers that enable enterprises to start
small and scale one node at a time.
Each server, also known as a node, includes Intel-powered x86 or IBM Power hardware with
flash SSDs and HDDs. Nutanix software running on each server node distributes all operating
functions across the cluster for superior performance and resilience. The Acropolis Distributed
Storage Fabric simplifies storage and data management for virtual environments. By pooling
flash and hard disk drive storage across a Nutanix cluster and exporting it as a data store to the
virtualization layer as iSCSI, NFS, and SMB shares, DSF eliminates the need for SAN and NAS
solutions. A key concept for HCI is using SSD disks for hot data (active) and HDD disks for cold
data (non-active).
Acropolis Distributed Storage Fabric joins HDD and SSD resources from across a
cluster into a storage pool.
Enough of theory and let me help you with Nutanix Administrator useful commands for day to day
operations. The key aspect of HCI is to make sure the storage online during the maintenance
window or for any node/disk/chassis failures. AOS (Operating System) on Control VM requires
frequent upgrades to get the newer feature sets and 1-Click upgrade is a famous selling point for
the Nutanix Sales campaign. LCM (Life Cycle Manager) is a 1-Click tool to upgrade the
BMC/BIOS and other devices firmware like LSA Controller, Disks & SATADOM. Prism Central is
another Virtual Machine to be deployed in the Infrastructure like vCenter and always keep the
Prism Central at the higher version than your Prism Element Clusters. (This point helps to make
you real-time Nutanix Administrator). You can explain about some theory from above HCI notes
and follow by below commands, will give the confidence for the Interview panel to select you for
Nutanix Administrator jobs. Make sure you stress the point of AOS upgrade failures with /home
full, Failed SATADOM replacement, CVM rescue, Nutanix Files, Async DR, NearSync, Metro
Clusters & shutdown token failed to release from CVM terms to keep the discussion more real-
time and promote discussion from theory to practical knowledge.
1. svmips && hostips && ipmiips - Display all the IP's in the cluster
2. cluster status | grep -v UP - Display the status of the services in
Control VM
3. nodetool -h 0 ring - Display the storage stack is online or not
4. ncli host list - Display the hypervisors list
5. ncli cluster get-redundancy-state - Display the Redundancy Factor value
6. allssh date - Display the date on ALL the Control VM's
7. hostssh date- Display the date on ALL the Hypervisors
8. ncli cluster info - Display the Cluster details (number of nodes -
cluster ID - VIP)
9. ncli ms list - Display the Hypervisor version
10. allssh ntpq -pn - Display the NTP stats
11. ncc --version - Display the Nutanix Cluster Check (NCC) version
12. ncc healthchecks run_all - Run the Nutanix Health Checks
13. ncc log_collector run_all - Run the Nutanix logs collection (Logbay
is the future tool)
SAMPLE OUTPUT FROM NUTANIX CLUSTER - THIS EXAMPLE HELPS YOU TO UNDERSTAND
THE COMMANDS ONLY
Id : 00058751-8a31-a11b-0000-00000000e3da::3
Uuid : 19878e31-0ff7-4f72-be10-d480fc8706ff
Name : NTNX-SERIAL-A
IPMI Address : 10.xx.28.176
Controller VM Address : 10.xx.30.176
Controller VM NAT Address :
Controller VM NAT PORT :
Hypervisor Address : 10.xx.29.176
Hypervisor Version : Nutanix 20170830.184
Host Status : NORMAL
Oplog Disk Size : 200 GiB (214,748,364,800 bytes) (0.5%)
Under Maintenance Mode : null (-)
Metadata store status : Metadata store enabled on the node
Node Position : Node physical position can't be displayed for this model.
Please refer to Prism UI for this information.
Node Serial (UUID) : OM162S010412
Block Serial (Model) : 16SM52340075 (NX-6155-G5)
Cluster Id : 00058751-8a31-a11b-0000-00000000e3da::58330
Cluster Uuid : 00058751-8a31-a11b-0000-00000000e3da
Cluster Name : Diehard
Cluster Version : 5.10.3.1
Cluster Full Version : el7.3-release-euphrates-5.10.3.1-stable-
655d4def34bf18785782f2adb8cdd5f8457d1fe3
External IP address :
Node Count : 1
Block Count : 1
Shadow Clones Status : Enabled
Has Self Encrypting Disk : no
Cluster Masquerading I... :
Cluster Masquerading PORT :
Is LTS : true
External Data Services... :
Support Verbosity Level : BASIC_COREDUMP
Lock Down Status : Disabled
Password Remote Login ... : Enabled
Timezone : UTC
NCC Version : ncc-3.7.0.2
Common Criteria Mode : Disabled
Degraded Node Monitoring : Enabled
Name : 10.xx.29.176
Uuid : dce5b69a-5905-41b9-97bf-a32c5c4d0b45
Hypervisor Type : AHV
Access URL : qemu+ssh://10.xx.29.176/system
User Name : root
Password : ****
Hypervisor Version : el6.nutanix.20170830.184
2. Clock time of an ESXi 6.x host is not correct. What should an administrator do to
correct this issue?
To correct the time on ESXi host, modify the time for the host using the vSphere client and,
correct the NTP settings in the /etc/ntp.conf file.
3. An administrator wants to shutdown the host using ESXi host. Which option would
be used in Direct Console User Interface to perform this task?
To shutdown the host for Direct Console User Interface (DCUI), administrator will press F12 Key.
4. An administrator can access ESXi host via vCenter Server using vSphere Web Client
but unable to access directly via VClient. What should he do to access ESXi host
directly?
If ESXi host connected to vSphere Web Client is being accessed and can’t be accessed directly,
we should check that Lockdown is not enabled. If it enabled, we should be disabled. Because if
Lockdown is enabled, ESXi hosts can only be accessed via vCenter Server, you cannot directly
access any host.
9. What happens to the files contained on shared storage When a Content Library is
deleted?
When Content Library is deleted, all stored files in content library will be deleted.
10. What is the maximum number of vCPUs are required for a VM in vSphere 6.0?
Maximum 128 vCPUs can be allocated to a VM vSphere 6.0.
11. A windows domain user can be logged into vSphere using vSphere Web Client. What
are the requirements to be met for this feature to be available and functional?
An administrator can allow users to login to vSphere Web Client using Windows session
authentication. For this purpose, Install the vSphere Web Client Integration browser plug-in on
each computer from where a user will sign in. The users must be signed into Windows using
Active Directory user accounts. And, administrator must create a valid Identity Source in Single
Sign-On for the users’ domain.
12. An administrator wants to clone a virtual machine using the vSphere Client. Which
explains why the Clone option is missing?
To clone a VM can be perform from vCenter Server either you connected via vSphere Web
Client or VClient. If you are directly connected to an ESXi host, you cannot perform cloning of a
VM.
13. What will happen if the .nvram file is deleted accidently from a VM?
.nvram file is used to store the BIOS state of a VM. If it deleted for some reason, then, .nvram file
will be created again when the virtual machine is powered on.
14. An administrator wants to connect the vSphere 5.5 Client to ESXi 6.x host. What will
occur?
If administrator tries to connect the vSphere 5.5 Client to ESXi 6.x host, the operation will prompt
the administrator to run a script to upgrade the vSphere Client.
15. Which one of secondary Private VLANs (PVLANs) type can send packets to Isolated
PVLAN?
Promiscuous type of PVLAN can communicate and send packets to an Isolated PVLAN.
16. What sample roles are provided by default when vCenter is installed?
When vCenter is installed, Virtual machine user and Network Administrator roles are provided.
17. What will happen when all paths down (APD) event occurs for the software FCoE
storage?
If all paths down event occurs, Spanning Tree Protocol is enabled on the network ports.
18. What methods are available for upgrade a host from ESXi 5.x to ESXi 6.x?
vSphere Update Manager (VUM), esxcli command line tool, and vSphere Auto Deploy can be
used to upgrade.
20. vCenter Server upgradation fails at the vCenter Single Sign-On installation. What
should do to complete the upgrade process?
Before upgrading vCenter Server, verify that the VMware Directory service can stop by manually
restarting it. If it stopped manually, then you can start upgradation process of vCenter Server.
21. What prerequisites should be considered before upgrading the vCenter Server
Appliance?
In case of upgradation of vCenter Server Appliance (vCSA) or after fresh installation, Client
Integration Plugin (CIP) will be installed in both cases.
22. After deploying a PSC, vCenter Server is not being installed and shows the following
error:
Could not contact Lookup Service. Please check VM_ssoreg.log.
If this error appears then verify that the clocks on the host machines running the PSC, vCenter
Server, and the vSphere Web Client are synchronized with each other. And also ensure that
there is no firewall blocking port 7444 between the PSC and vCenter Server.
23. An administrator installed Windows Server 2008 and want to install vCenter Server
on it but failed when installing on a Windows virtual machine?
vCenter Server installation requires 64bit Windows OS to install. If you try to install it on Windows
Server 2008, it would not be installed and installation will be failed. vCenter Server will be
installed in Windows Server 2008 R2 or higher Windows OS.
24. What is the minimum Virtual Hardware version required for vFlash Read Cache?
vFlash Read Cache was first in vSphere 5.5, and the minimum Virtual Hardware version for
vSphere 5.5 is version 10.
25. ESXi host is added in vCenter Server but not responding in vSphere Web Client. If
this issue occurs due to firewall, which port should be opened?
If administrator sees no response of added ESXi 6.x host in vCenter Server, and the issue is
caused by network firewall blocking traffic. Then he should check that port 902 (UDP) is not
blocked by firewall. If it happens, enable the port from Security Profile by using vSphere Web
Client by selecting said ESXi host in vCenter Server.
28. While upgrading an ESXi 5.5 host to ESXi 6.x, following error appears:
MEMORY_SIZE
What does this require to do?
It indicates insufficient memory on the ESXi host to complete the upgrade process of an ESXi
host from ESXi 5.5 to ESXi 6.x.
29. To remove a host from a vSphere Distributed Switch (vDS), following error message
is observed:
The resource ’10’ is in use
Before removing vDS, it is ensured that VMkernel network adapters on the vDS are not in use. If
any of resource of vDS is being used, then above mentioned error message with resource ID will
appear.
30. An administrator wants to monitor network traffic and tries to capture network traffic
for a VM, but cannot see the expected traffic in the packet capture tool. What should
he do to resolve the problem?
If administrator needs to capture network traffic for a VM, he should Enable Promiscuous Mode
on the relevant port group. Then he can capture the network traffic by using any networking
traffic capturing tool.
31. A vSAN Cluster is created with six nodes along with the fault domain, and three of
them moved into fault domain. One-member node of fault domain fails. What will
happen with the remaining two nodes exist in fault domain?
When a member node of fault domain fails, the remaining two fault domain members will be
treated as failed.
32. At which level a vSAN Fault Domain configured?
Fault domain is configured at vSAN Cluster level and nodes will be added in this domain. If any
member node fails due to any reason, remaining members of fault domain will also be
considered as fail.
33. It is observed that a VM storage activity on an ESXi 6.x host is negatively affecting a
VM storage activity on another host that is accessing the same VMFS Datastore.
Which action would mitigate the issue?
To control the storage activity of a VM to affecting another VM’s storage activity, Storage IO
Control (SIOC) should be enabled. Storage I/O Control provides much needed control of storage
I/O and should be used to ensure that the performance of your critical VMs are not affected by
VMs from other hosts when there is contention for I/O resources.
34. While upgrading an ESXi host from 5.5to 6.0, administrator runs the following
command:
esxcli software vib list –rebooting-image
What will be shown by this command?
This command will show all active VIBs (vSphere Installation Bundle). VIB is is a collection of
files like tarball or zip that packaged into a single archive to facilitate distribution.
35. To troubleshoot a CPU performance issues of a VM, which counters will be used to
demonstrate CPU contention?
To test the performance of an ESXi host in the form of memory, CPU, and network utilization,
ESXTOP tool is used. It is a very good tool available for VMware administrators to troubleshoot
the performance issues. For configuring ESXTOP, you’ll need vSphere Client and putty and SSH
session should be enabled. For CPU performance testing, %RDY, %MLMTD, and %CSTP
counters are used.
36. An administrator tries to run esxtop by enabling SSH and using putty to
troubleshoot CPU performance issues, but no output displayed. How to resolve this
issue?
To display output in ESXTOP, press f and place an asterisk next to each field that should be
displayed.
37. An administrator wants to monitor VMs on a host using vCenter Server and send
notifications when memory usage crosses 80%. What should an administrator do in
vCenter Server to accomplish this?
To monitor VM’s memory usage that reaches 80%, vCenter Server alarm will be created that will
monitor VM’s memory usage and set an action to email the notification.
38. An administrator created a DRS cluster and it became unbalanced. What are likely
causes to become unbalanced?
DRS cluster can become unbalance when Affinity rules are preventing VMs from being moved.
And a device is mounted to a VM is preventing vMotion from one host on another.
39. An IT administrator configured two vCenter Servers within a PSC, and needs to grant
a user privileges that can access all environments. What access level is required to
access all the environment?
To access multiple vCenter Servers within a PSC, it requires Global Permission to access all
environments.
40. An administrator created 10 ESXi 6.x hosts via Auto Deploy for a new Test/Dev
cluster and all hosts are configured to obtain their IP address via DHCP. Which DCUI
option should the administrator use to renew the DHCP lease for the hosts?
To renew the DHCP lease for the hosts, “Reset Management Network” of Direct Console User
Interface (DCUI) option is used.