Welcome back. We are now at Part 2 of NSX-T 3.0 Edge Cluster Automated Deployment and Architecture in VCF 4.0.
In case you missed Part 1, you can read it here:
In this article, let’s do a console walkthrough to review the deployment and take a look at some additional manual configurations that we need to be aware of if we plan to consume the Edge Cluster for K8S Workload Management in vSphere 7.
The below tasks are already automated by VCF, we are just taking a look at it to have a better understanding for troubleshooting purposes. If you are already aware, you can skip these and jump on the section titled ” Additional Considerations to be aware of “
Let’s continue:
vSphere Resource Pools
The Edge nodes are deployed on dedicated resource pools in the VI Workload domain.
Anti-Affinity Rules
vSphere Anti-affinity rules are created to ensure that Edge nodes will always run on separate hosts.
Compute host networking (DVS)
Edge nodes use a single-NVDS Multi-TEP architecture. Three vnics (out of four) are used from the edge nodes for edge networking. The first vnic attaches to the host management portgroup (same as ESXi management) and the other two connect to Trunk VLAN portgroups on host DVS. The necessary dot1q tags for the Edges are applied from within Edge node itself. TEP tags are applied through Uplink profiles and T0 uplink tags are applied to VLAN Logical Segments.
Teaming policies are configured on the host DVS so that each trunk VLAN portgroup is mapped to separate host uplinks.
Edge VM etho -> attaches to the host management portgroup. This is the same VLAN used for ESXi host management.
Edge VM fp-eth0 -> Attaches to Trunk VLAN Portgroup 1 on host DVS
Edge VM fp-eth1 -> Attaches to Trunk VLAN Portgroup 1 on host DVS
Transport Zones
Edge nodes are a part of two Transport zones – Overlay and VLAN. This is the same Overlay TZ that the Compute hosts are a part of. If the VI Workload domain is used for vSphere Workload Management usecase, make sure that we deploy only one edge cluster on this Overlay TZ.
Compute hosts don’t require a VLAN backed TZ as they leverage vSphere DVS for VLAN networking (from vSphere 7, we have a Converged VDS).
Uplink Profiles and Named Teaming Policies
Uplink Profile sets the Edge TEP VLAN (108), TEP mode (Multi-TEP) and Named Teaming Policies for deterministic peering.
The ‘Default Teaming Policy’ is set to ‘Load Balance Source’ over the two active uplink interfaces making it a Multi-TEP. The Uplink1 & Uplink2 logical constructs here maps to fp-eth0 & fp-eth1 in the Transport Node configuration wizard.
Named Teaming Policies are used for the T0 Uplink VLAN logical Segments to achieve deterministic eBGP peering with the Leafs over pre-determined uplink interfaces (uplink1 or upink2)
VLAN 106 T0 Uplink Traffic -> Edge uplink1 (Policy applied to the VLAN LS)
VLAN 107 T0 Uplink Traffic -> Edge uplink2 (Policy applied to the VLAN LS)
Named Teaming Policies are applied to The VLAN TZ to which the Edge nodes are a member of.
Transport Node Configuration
A single NVDS is deployed which is a part of both TZ. TEP IP addresses are statically configured that come from the inputs specified in the VCF Workflow. Just for FYI, host TEP IPs are assigned from an external DHCP server (as part of VCF workflow)
Edge Cluster
Both edges are added to the Edge Cluster
T0 Uplink VLAN Logical Segments
Two Logical Segments are created for the T0 Uplinks over VLANs 106 and 107. Named Teaming Policies created earlier are applied here.
T0 Gateway
T0 Gateway is deployed in Active-Active mode as specified in the Workflow. There are 4 interfaces – two via Edge node 1 and the other two via Edge node 2 over VLANs 106 and 107 which will peer with leaf Switches over BGP.
BGP Configuration
BGP neighborship with the Leafs is established over VLANs 106 and 107. Leaf 1 peers with the Edges over VLAN 106 and Leaf 2 peers over VLAN 107.
BFD is not enabled as part of the workflow, we will set it up manually towards the end.
Let’s confirm this. This is the output from Edge Node 1.
This is the output from Leaf Switch 1.
Additional Considerations to be aware of
First
If the use case for the VI Workload domain and Edge Cluster is to support Workload Management in vSphere 7, then be aware about a default ‘Deny’ Route-map that is applied to the Redistribution criteria in the BGP Process of the Tier 0 Gateway. This prevents all prefixes from getting advertised to the Leaf Switches.
For more details, please read the official documentation below:
Since our scenario is to support Workload Management, we will replace the route-map with another one that allows Ingress and Egress CIDRs to/from the Tenants. The Ingress / Egress CIDR is carved out of block 192.168.105.0/24.
Second
Our Ingress / Egress CIDR pool for Tenants is 192.168.105.0/24, a /32 subnet is carved out of this for use in SNAT rules and Loadbalancer VIPs. Ultimately they are also advertised in BGP as /32 prefixes which end up in large number of entries on the upstream switches and routers which is unnecessary. Below is the screengrab from Leaf 1
Let’s summarize those routes at the T0 Gateway and confirm that only the Summary address is advertised.
Third
When using Active-Active ECMP enabled T0 Gateway (Edges) for Workload Management in vSphere, there are chances that we may face ingress/egress issues with pod networking. I had this issue where the pods were unable to pull images from external repositories eventhough K8S network policies were configured for Ingress/ Egress to/from pod networks. We might need to relax the URPF mode on the T0 Gateway Interfaces to fix this.
Fourth
BFD is not enabled in the BGP process by the VCF workflow. This needs to be enabled manually. Note that this need to be enabled both on the Leaf Switches and T0 Gateway.
This is the BFD Neighbor table from Leaf 1, it should see both the Edge nodes.
Fifth
Create a Default Static route on the T0 Gateway for internet access. Since we leverage Egress CIDR for this, make sure that this subnet (192.168.105.0/24 in our case) has necessary NAT mappings on the external firewall / gateways for internet access.
Sixth
Optionally, if we need ssh access to the Edges, enable it.
We are now set to consume the Edge Cluster for our workloads. In the next article, we will use the VCF 4.0 Workflow to enable vSphere 7 Workload Management on the VI WLD and consume the Edge Cluster that we built. Stay tuned.
I hope this article was informative.
Thanks for reading
Missed Part 1? Here it is:
Is there a reason you create a manual static route on the T0 as opposed to simply having the leaf switches (or something upstream) advertise it over BGP? Given the T0 Leaf ECMP options, it seems doing that over BGP would better handle failure conditions vs a static route over one of the specific uplink subnets.
LikeLiked by 1 person
Hi Jefferson, Yes advertising a default route via the Leaf switches would be the best option. Thanks for pointing that out.
LikeLike
Thanks a lot for this tutorial part, it helped me fix the probleme of deploying Tanzu in my environment (I was getting the error of configurating cluster NIC on master VM
LikeLike
Thanks Mudjomba, glad to know it helped. Cheers
LikeLike
Can you elaborate on the following statement you included under the Transport Zone section: “If the VI Workload domain is used for vSphere Workload Management usecase, make sure that we deploy only one edge cluster on this Overlay TZ”?
I ask because if I deploy a T1 within a WLD, I wouldn’t want to share the T0 Edge Cluster with a T1 Gateway because of the impact it would have around N/S ECMP (which you’ve covered in a previous blog post). I would want to deploy a different Edge Cluster for the T1, yet your comment suggests not to do that.
Any clarification you can provide would be greatly appreciated. Thank you!
LikeLike
Hi Luis, actually one edge cluster per Overlay Transport zone was a limitation in VCF 4.0 to support Workload management (vSphere with Tanzu) . which was lifted in subsequent releases. Currently multiple Edge clusters / TZ are supported. Thanks
LikeLike