Hello everyone, in this post we will walk through an introduction to the NSX controller cluster and how the logical switching and routing control plane is being handled. Controller cluster is a distributed state management system for the logical switches and routers that is being deployed into the platform. Controller cluster then pushes this logical information down to the ESXi hypervisor for in-kernel routing functions. The information is also kept in sync with the NSX manager.
The controller cluster doesn’t have any dataplane traffic flowing through it. By isolating the data plane and control plane, a failure of the control plane doesn’t have a direct impact on the data plane traffic provided that the hosts have the necessary logical in-kernel information that they need ie, the VTEP tables, MAC tables, ARP tables etc.
The controller cluster is deployed with a minimum of 3 nodes which is necessary to maintain a quorum. The zookeeper component of the controllers always ensure that there is a majority using paxos algorithm and avoids a split-brain situation.
A controller cluster has 5 roles – API Provider, Persistence Server, Switch Manager, Logical Manager & Directory Server. There is a master controller node for these roles. If the master controller node goes offline for some reasons, a new master is elected for these roles.
The above figure shows that the Controller-1 (192.168.11.200) is the master for the roles and the other controllers are slaves.
Now lets look at a logical architecture and understand how the logical switching and routing information is handled by the Controllers.
Here I have created a Tenant with two logical segments that are routed by a DLR instance. VM1 and VM2 attach to the logical switch 5000 & VM3 and VM4 attach to logical switch 5001. Each VM is on separate physical hosts. Since VM1 and VM2 are on the same logical switch but on two different hosts, the VTEP Table for VNID 5000 will have entries of both hosts 1 & 2. It wont have entries for hosts 3 & 4. The entries will be populated when you either power on a VM on those hosts or migrate a VM to those hosts on the same logical switch.
Logical Switching control plane
For each logical switch that you deploy, one of the controller node will be the master for that logical switch to maintain all logical switching information for that VNI segment. This particular controller node maintains the VTEP table, BFD table, ARP table, MAC table and connection table for that particular VNID. This is how you see the VNI-Table. VNI-Table shows the Controller to VNID mapping.
The above screenshot shows that controller-3 (192.168.11.202) is the authoritative controller for many logical switches – VNID 4999, 5000, 5002, 5004 & 5005. Lets handpick one VNID 5000 and look at the different tables for VNID 5000
VTEP Table -> This shows hosts 1 & 2 in the table. You see 4 VTEP IPs here because each host has two NICS for VxLAN.
Connection Table ->This shows the connections established on the VNID 5000
ARP Table -> This is the ARP Table for the VNID 5000
MAC Table -> This shows the mapping information of MAC Address of VMs on the VNID 5000 to the VTEP IP.
Similarly for VNID 5001, the authoritative controller is controller-1 (192.168.11.200). You need to execute the above commands on that controller node to see the tables.
Logical Routing control plane
Now lets see how the routing information for the DLR instances are maintained. Similar to how logical switch information is maintained, one of the controller node will be elected as the master to maintain all logical routing information for a particular instance of the DLR.
This is how the mapping table of logical router instances to the controllers look like:
This shows two DLRs, each of whose logical routing information is maintained by separate controllers. In the architecture above, I mentioned only 1 DLR, this extra one is a dummy instance that I created for this article.
Routing Table -> This is the routing table information of the DLR instance that is maintained by this controller instance. This routing information is pushed down to the hypervisor of all the ESXi hosts in the transport zone to help with in-kernel routing.
The next-hop-ip points to the ESG and the Preference value of 100 denotes OSPF peering.
Interface Summary Table -> This table represents the LIFs of the DLR.
If the DLR has got an L2 Bridge instance configured, then you might see a VLAN LIF as well. In my case, I configured a bridge instance on my second DLR and the interface summary table looks like the below:
Controller-to-Controller communication happens via IPSec. IPSec settings should be managed via NSX API. Visit the below KB for more information.
This is how you view the status of IPSec tunnels.
Verifying the Controller Status
You can verify the controller status from the NSX manager UI as well as from the controller CLI.
The controller CLI gives you a wider status. This shows you whether this controller has joined to the cluster majority and its connection status with its peer controllers.
In case you need to reboot a controller node, the CLI shows you how safe it is to restart it, thereby you can avoid some unprecedented post reboot sync issues.
This is all for this post. I will continue with Part 2 on how to recover the Controller cluster from a failure. Thanks for reading