First, Verify that the ESXi host is in a powered ON state
It would help if you made sure it is turned on both physically in the rack and is available via remote console (iLO/iDRAC). The problem is, your host might have met the infamous PSOD (Purple Screen of Death, AKA Purple Diagnostic Screen).
Second, restart the Management agents
This service is responsible for synchronizing VMware components and granting access to the ESXi host through the vCenter Server. As for the restarting the Management agents, here is a recap of the steps:
Restart Management agents in ESXi Using ESXi Shell or Secure Shell (SSH):
Log in to ESXi Shell or SSH as root. For Enabling ESXi Shell or SSH, see Using ESXi Shell in ESXi 5.x and 6.x (2004746).
Restart the ESXi host daemon and vCenter Agent services using these commands: /etc/init.d/hostd restart /etc/init.d/vpxa restart
Alternatively:
To reset the management network on a specific VMkernel interface, by default vmk0, run the command: esxcli network ip interface set -e false -i vmk0; esxcli network ip interface set -e true -i vmk0 Note: Using a semicolon (;) between the two commands ensures the VMkernel interface is disabled and then re-enabled in succession. If the management interface is not running on vmk0, change the above command according to the VMkernel interface used.
To restart all management agents on the host, run the command: services.sh restart
Third, verify the network connectivity exists from vCenter Server to the ESXi host
Although it seems obvious, you’ll be surprised to find out how many people actually forget to do it beforehand. To do so, just initiate a ping test from your ESXi host
Fourth, Verify that you can connect from vCenter Server to your ESXi host
The tricky thing about vCenter is that the ESXi host sends heartbeats, and vCenter Server has a window of 60 seconds to receive the heartbeats. Once it doesn’t receive heartbeats from the host in 60 seconds, vCenter Servers marks this ESXi as Not Responding and eventually Disconnected.
Sometimes it isn’t working out because the ESXi host just can’t see vCenter Server behind NAT:
If the above has happened to you, now you have to allow connections to the vCenter Server from the ESXi host via 902 (TCP/UDP) port:
You can easily test 902 port connectivity with Telnet.
Here VMware Knowledge Base will come in handy:
Verifying the vCenter Server Managed IP Address (1008030)
ESXi 5.0 hosts are marked as Not Responding 60 seconds after being added to vCenter Server (2020100)
ESXi/ESX host disconnects from vCenter Server after adding or connecting it to the inventory (2040630)
ESX/ESXi host keeps disconnecting and reconnecting when heartbeats are not received by vCenter Server (1005757)
By the way, in the case of a congested network, you can increase the 60-second heartbeat interval to, say, 120 seconds if necessary.
Try disconnecting your ESXi host from vCenter Server inventory and then connecting back
Nothing can help you? Time for logs
As the first step, look into the vpxa (/var/log/vpxa.log) file, as it is suggested here. If the reason for the trouble is a lack of service console memory allocated for the vCenter Server agent, in the vpxa log you’ll see errors such as these:
[2007-07-28 17:57:25.416 ‘Memory checker’ 5458864 error] Current value 143700 exceeds hard limit 128000. Shutting down process. [2007-07-28 17:57:25.420 ‘Memory checker’ 3076453280 info] Resource checker stopped.
Also verify if hostd service works and responds to commands. Look into the hostd log file (/var/log/vmware/hostd.log), as it is suggested here. For example, you can find an error such as this one:
2014-06-27T19:57:41.000Z [282DFB70 info ‘Vimsvc.ha-eventmgr’] Event 8002 : Issue detected on sg-pgh-srv2-esx10.sg-pgh.idealcloud.local in ha-datacenter: hostd detected to be non-responsive
Many things can lead to such error, but the most common reason is that you don’t have enough resources for hosted service on your host.
Comments