Welcome back. We are now at Part 2 of NSX-T 3.0 Edge Cluster Automated Deployment and Architecture in VCF 4.0.
In case you missed Part 1, you can read it here:
In this article, let’s do a console walkthrough to review the deployment and take a look at some additional manual configurations that we need to be aware of if we plan to consume the Edge Cluster for K8S Workload Management in vSphere 7.
The below tasks are already automated by VCF, we are just taking a look at it to have a better understanding for troubleshooting purposes. If you are already aware, you can skip these and jump on the section titled ” Additional Considerations to be aware of “
vSphere Resource Pools
The Edge nodes are deployed on dedicated resource pools in the VI Workload domain.
vSphere Anti-affinity rules are created to ensure that Edge nodes will always run on separate hosts.
Compute host networking (DVS)
Edge nodes use a single-NVDS Multi-TEP architecture. Three vnics (out of four) are used from the edge nodes for edge networking. The first vnic attaches to the host management portgroup (same as ESXi management) and the other two connect to Trunk VLAN portgroups on host DVS. The necessary dot1q tags for the Edges are applied from within Edge node itself. TEP tags are applied through Uplink profiles and T0 uplink tags are applied to VLAN Logical Segments.
Teaming policies are configured on the host DVS so that each trunk VLAN portgroup is mapped to separate host uplinks.
Edge VM etho -> attaches to the host management portgroup. This is the same VLAN used for ESXi host management.
Edge VM fp-eth0 -> Attaches to Trunk VLAN Portgroup 1 on host DVS
Edge VM fp-eth1 -> Attaches to Trunk VLAN Portgroup 1 on host DVS
Edge nodes are a part of two Transport zones – Overlay and VLAN. This is the same Overlay TZ that the Compute hosts are a part of. If the VI Workload domain is used for vSphere Workload Management usecase, make sure that we deploy only one edge cluster on this Overlay TZ.
Compute hosts don’t require a VLAN backed TZ as they leverage vSphere DVS for VLAN networking (from vSphere 7, we have a Converged VDS).
Uplink Profiles and Named Teaming Policies
Uplink Profile sets the Edge TEP VLAN (108), TEP mode (Multi-TEP) and Named Teaming Policies for deterministic peering.
The ‘Default Teaming Policy’ is set to ‘Load Balance Source’ over the two active uplink interfaces making it a Multi-TEP. The Uplink1 & Uplink2 logical constructs here maps to fp-eth0 & fp-eth1 in the Transport Node configuration wizard.
Named Teaming Policies are used for the T0 Uplink VLAN logical Segments to achieve deterministic eBGP peering with the Leafs over pre-determined uplink interfaces (uplink1 or upink2)
VLAN 106 T0 Uplink Traffic -> Edge uplink1 (Policy applied to the VLAN LS)
VLAN 107 T0 Uplink Traffic -> Edge uplink2 (Policy applied to the VLAN LS)
Named Teaming Policies are applied to The VLAN TZ to which the Edge nodes are a member of.
Transport Node Configuration
A single NVDS is deployed which is a part of both TZ. TEP IP addresses are statically configured that come from the inputs specified in the VCF Workflow. Just for FYI, host TEP IPs are assigned from an external DHCP server (as part of VCF workflow)
Both edges are added to the Edge Cluster
T0 Uplink VLAN Logical Segments
Two Logical Segments are created for the T0 Uplinks over VLANs 106 and 107. Named Teaming Policies created earlier are applied here.
T0 Gateway is deployed in Active-Active mode as specified in the Workflow. There are 4 interfaces – two via Edge node 1 and the other two via Edge node 2 over VLANs 106 and 107 which will peer with leaf Switches over BGP.
BGP neighborship with the Leafs is established over VLANs 106 and 107. Leaf 1 peers with the Edges over VLAN 106 and Leaf 2 peers over VLAN 107.
BFD is not enabled as part of the workflow, we will set it up manually towards the end.
Let’s confirm this. This is the output from Edge Node 1.
This is the output from Leaf Switch 1.
Additional Considerations to be aware of
If the use case for the VI Workload domain and Edge Cluster is to support Workload Management in vSphere 7, then be aware about a default ‘Deny’ Route-map that is applied to the Redistribution criteria in the BGP Process of the Tier 0 Gateway. This prevents all prefixes from getting advertised to the Leaf Switches.
For more details, please read the official documentation below:
Since our scenario is to support Workload Management, we will replace the route-map with another one that allows Ingress and Egress CIDRs to/from the Tenants. The Ingress / Egress CIDR is carved out of block 192.168.105.0/24.
Our Ingress / Egress CIDR pool for Tenants is 192.168.105.0/24, a /32 subnet is carved out of this for use in SNAT rules and Loadbalancer VIPs. Ultimately they are also advertised in BGP as /32 prefixes which end up in large number of entries on the upstream switches and routers which is unnecessary. Below is the screengrab from Leaf 1
Let’s summarize those routes at the T0 Gateway and confirm that only the Summary address is advertised.
When using Active-Active ECMP enabled T0 Gateway (Edges) for Workload Management in vSphere, there are chances that we may face ingress/egress issues with pod networking. I had this issue where the pods were unable to pull images from external repositories eventhough K8S network policies were configured for Ingress/ Egress to/from pod networks. We might need to relax the URPF mode on the T0 Gateway Interfaces to fix this.
BFD is not enabled in the BGP process by the VCF workflow. This needs to be enabled manually. Note that this need to be enabled both on the Leaf Switches and T0 Gateway.
This is the BFD Neighbor table from Leaf 1, it should see both the Edge nodes.
Create a Default Static route on the T0 Gateway for internet access. Since we leverage Egress CIDR for this, make sure that this subnet (192.168.105.0/24 in our case) has necessary NAT mappings on the external firewall / gateways for internet access.
Optionally, if we need ssh access to the Edges, enable it.
We are now set to consume the Edge Cluster for our workloads. In the next article, we will use the VCF 4.0 Workflow to enable vSphere 7 Workload Management on the VI WLD and consume the Edge Cluster that we built. Stay tuned.
I hope this article was informative.
Thanks for reading
Missed Part 1? Here it is: