VMware vSphere HA (High Availability) is a utility included in VMware's vSphere software that can restart failed virtual machines (VMs) on alternative host servers to reduce application downtime.
VSphere HA enables a server administrator to pool physical servers on the same network into a logical group called a high availability cluster. During a server failure, such as a system crash, power interruption or network failure, vSphere HA detects which VMs are down and restarts them on another stable system within the cluster. This process of restarting failed workloads on secondary systems is called failover.
VMware first introduced vSphere HA in Virtual Infrastructure 3 in 2006 and has continued to develop and support the feature.
How vSphere HA works
VMware vSphere HA allows organizations to improve availability by automatically detecting failed VMs and restarting them on different physical servers without manual human involvement. The ability to restart these VMs on different physical hardware is possible because Virtual Machine Disk (VMDK) files are kept on shared storage, accessible to all physical servers connected via HA cluster.
VMware vSphere HA uses a utility called the Fault Domain Manager agent to monitor ESXi host availability and to restart failed VMs. When setting up vSphere HA, an administrator defines a group of servers to serve as a high-availability cluster. The Fault Domain Manager runs on each host within the cluster. One host in the cluster serves as the master host -- all other hosts are referred to as slaves -- to monitor signals from other hosts in the cluster and communicate with the vCenter Server.
Host servers within an HA cluster communicate via a heartbeat, which is a periodic message that indicates a host is running as expected. If the master host fails to detect a heartbeat signal from another host or VM within the cluster, it instructs vSphere HA to take corrective actions. The type of action depends on the type of failure detected, as well as user preferences. In the case of a VM failure in which the host server continues to run, vSphere HA restarts the VM on the original host. If an entire host fails, the utility restarts all affected VMs on other hosts in the cluster.
The HA utility can also restart VMs if a host continues to run, but loses a network connection to the rest of the cluster. The master host can monitor if that host is still communicating with network-connected data stores to detect if a network-segregated host is still running. Shared storage, such as a storage area network, enables hosts in the cluster to access VM disk files and restart the VM, even if it was running on another server in the cluster.
VMware vSphere HA requirements and best practices
Administrators can adjust many HA settings, including how long a VM or host is unavailable before vSphere HA attempts to restart it; the default value is 120 seconds. An admin can set VM restart preferences, selecting the order in which VMs restart in the cluster. This setting is useful if, for example, there is insufficient space on the cluster to restart all the failed VMs. In many cases, an administrator assigns a higher restart priority to VMs running mission-critical applications.
An organization can also define affinity and anti-affinity rules to restrict where certain VMs are placed. Affinity and anti-affinity rules prevent specified VMs from restarting on selected servers or on servers that already host other specified VMs. These rules are useful to ensure that CPU-intensive VMs don't restart on the same host after a disaster or to ensure that two copies of a high-priority application don't end up on the same host and create a potential single point of failure.
Comentarios