vSAN Health Checks Explained – Part 2


Hello everyone, in the last post we’ve gone through the health tests under the “Cluster” category. Now, lets go through the next category “Network”

c1

Hosts disconnected from VC

This alert is triggered when we have a communication issue over the management vmkernel port. vSAN cluster uses its dedicated port groups and might be able to talk to each other, but one or more hosts might fail to talk with the vCenter server. All ESXi hosts in the vSAN cluster uses vCenter server as a single point of truth. vCenter maintains the vSAN cluster membership & the unicast tables and pushes this information down to all the ESXi hosts in the vSAN cluster. When you re-establish the connection between the hosts and the vCenter, then use the vSphere client to connect the host (or hosts) to the vCenter server.

c2

 

Hosts with connectivity issues

This check refers to scenarios where vCenter server lists the host as connected, but API calls from vCenter to the host are failing. This would cause the same issues as described in the above test. You can confirm the connection status on the management network between vCenter and the host. This could also happen when the vSAN management daemon is down. You can check the status by SSHing to the host and issuing the below command:

c3

vSAN cluster partition

Starting with vSAN 6.6, it uses unicast communication between the hosts in the cluster. All the hosts in the vSAN cluster should be able to communicate to each other over the vSAN vmkernel port group. Otherwise a vSAN cluster will split into multiple partitions, i.e. sub-groups of hosts that can communicate, but not to other sub-groups. In this scenario, vSAN objects might become unavailable. You can verify whether the hosts are in same Network Partition Groups using the vSphere Client or esxcli.

c4

c5

All hosts have a vSAN vmknic configured

This test checks whether all the hosts in the vSAN cluster have a vmknic configured for vSAN traffic. This is essential to participate in the vSAN cluster and for all the hosts to be in the same network partition group. Even if an ESXi host is part of the vSAN cluster, but is not contributing storage, it must still have a VMkernel NIC configured for vSAN traffic.

c6

c7

vSAN: Basic (unicast) connectivity check

This test performs a basic ping test with a smaller MTU value just to ensure the vSAN vmknic connectivity. If this passes, basic unicast connectivity is working fine.

vSAN: MTU check (ping with large packet size)

This test is actually a compliment to the previous connectivity test, but with a large MTU of 9000. This basically tests and ensures that the end-to-end MTU between the vSAN hosts are consistent. In most cases inconsistent MTUs might not create a separate vSAN network partition but will degrade the vSAN performance.

For this test to succeed, it is not necessary that you configure the dvswitch and the ToR switches for Jumbo frames, but ensure that the values are consistent like default of 1500. In this case the vSAN vmknic will fragment the 9000 byte packet and they travel fine over the vSAN VLAN to the other ESXi host where they are reassembled. But if you use a high MTU at the vmknic and lower one at the physical switch (ToR), this can lead to packet loss and can give poor results.

You can manually test this via the ESXi shell

1

vMotion: Basic (unicast) connectivity check

This test is similar to the basic unicast test for vSAN (above) but is done against the vMotion vmkernel adapters

vMotion: MTU check (ping with large packet size)

This test is similar to the MTU test for vSAN (above) but is done against the vMotion vmkernel adapters

Network latency check

This test performs a network latency test between the vSAN hosts. Ideally vSAN expects a latency of less than 1 ms for local clusters and 5 ms for stretched clusters. If the latency is above 5 ms then this test indicates a warning.
2

That’s all for now.

Continue reading? Here are the other parts:

Part 1 -> https://vxplanet.com/2019/01/30/vsan-health-checks-explained-part-1/

Part 3 -> https://vxplanet.com/2019/03/11/vsan-health-checks-explained-part-3/

Part 4 -> https://vxplanet.com/2019/03/22/vsan-health-checks-explained-part-4/

Part 5 -> https://vxplanet.com/2019/03/29/vsan-health-checks-explained-part-5/

 

One thought on “vSAN Health Checks Explained – Part 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s