NSX-T 3.0 Edge Cluster Automated Deployment and Architecture in VCF 4.0 – Part 2

2020-05-03 20_06_59-Photos

Welcome back. We are now at Part 2 of NSX-T 3.0 Edge Cluster Automated Deployment and Architecture in VCF 4.0. 

In case you missed Part 1, you can read it here:

https://vxplanet.com/2020/04/25/nsx-t-3-0-edge-cluster-automated-deployment-and-architecture-in-vcf-4-0-part-1/

In this article, let’s do a console walkthrough to review the deployment and take a look at some additional manual configurations that we need to be aware of if we plan to consume the Edge Cluster for K8S Workload Management in vSphere 7.

The below tasks are already automated by VCF, we are just taking a look at it to have a better understanding for troubleshooting purposes. If you are already aware, you can skip these and jump on the section titled ” Additional Considerations to be aware of

Let’s continue:

vSphere Resource Pools

The Edge nodes are deployed on dedicated resource pools in the VI Workload domain.

201

Anti-Affinity Rules

vSphere Anti-affinity rules are created to ensure that Edge nodes will always run on separate hosts.

801

 Compute host networking (DVS)

Edge nodes use a single-NVDS Multi-TEP architecture. Three vnics (out of four) are used from the edge nodes for edge networking. The first vnic attaches to the host management portgroup (same as ESXi management) and the other two connect to Trunk VLAN portgroups on host DVS. The necessary dot1q tags for the Edges are applied from within Edge node itself. TEP tags are applied through Uplink profiles and T0 uplink tags are applied to VLAN Logical Segments.

Teaming policies are configured on the host DVS so that each trunk VLAN portgroup is mapped to separate host uplinks.

Edge VM etho -> attaches to the host management portgroup. This is the same VLAN used for ESXi host management.

Edge VM fp-eth0 -> Attaches to Trunk VLAN Portgroup 1 on host DVS

Edge VM fp-eth1 -> Attaches to Trunk VLAN Portgroup 1 on host DVS

202203204

Transport Zones

Edge nodes are a part of two Transport zones – Overlay and VLAN. This is the same Overlay TZ that the Compute hosts are a part of. If the VI Workload domain is used for vSphere Workload Management usecase, make sure that we deploy only one edge cluster on this Overlay TZ.

Compute hosts don’t require a VLAN backed TZ as they leverage vSphere DVS for VLAN networking (from vSphere 7, we have a Converged VDS).

 

207

Uplink Profiles and Named Teaming Policies

Uplink Profile sets the Edge TEP VLAN (108), TEP mode (Multi-TEP) and Named Teaming Policies for deterministic peering. 

The ‘Default Teaming Policy’ is set to ‘Load Balance Source’ over the two active uplink interfaces making it a Multi-TEP. The Uplink1 & Uplink2 logical constructs here maps to fp-eth0 & fp-eth1 in the Transport Node configuration wizard.

Named Teaming Policies are used for the T0 Uplink VLAN logical Segments to achieve deterministic eBGP peering with the Leafs over pre-determined uplink interfaces (uplink1 or upink2)

VLAN 106 T0 Uplink Traffic -> Edge uplink1 (Policy applied to the VLAN LS)

VLAN 107 T0 Uplink Traffic -> Edge uplink2 (Policy applied to the VLAN LS)

205

Named Teaming Policies are applied to The VLAN TZ to which the Edge nodes are a member of.

206

Transport Node Configuration

A single NVDS is deployed which is a part of both TZ. TEP IP addresses are statically configured that come from the inputs specified in the VCF Workflow. Just for FYI, host TEP IPs are assigned from an external DHCP server (as part of VCF workflow)

209

Edge Cluster

Both edges are added to the Edge Cluster

208

T0 Uplink VLAN Logical Segments

Two Logical Segments are created for the T0 Uplinks over VLANs 106 and 107. Named Teaming Policies created earlier are applied here.

210

211

212

T0 Gateway

T0 Gateway is deployed in Active-Active mode as specified in the Workflow. There are 4 interfaces – two via Edge node 1 and the other two via Edge node 2 over VLANs 106 and 107 which will peer with leaf Switches over BGP.

213214

BGP Configuration

BGP neighborship with the Leafs is established over VLANs 106 and 107. Leaf 1 peers with the Edges over VLAN 106 and Leaf 2 peers over VLAN 107.

BFD is not enabled as part of the workflow, we will set it up manually towards the end.

401

Let’s confirm this. This is the output from Edge Node 1.

220

This is the output from Leaf Switch 1.

221

Additional Considerations to be aware of

First

If the use case for the VI Workload domain and Edge Cluster is to support Workload Management in vSphere 7, then be aware about a default ‘Deny’ Route-map that is applied to the Redistribution criteria in the BGP Process of the Tier 0 Gateway. This prevents all prefixes from getting advertised to the Leaf Switches.

For more details, please read the official documentation below:

https://docs.vmware.com/en/VMware-Cloud-Foundation/4.0/com.vmware.vcf.admin.doc_40/GUID-716D6CA0-FD58-44EA-BF5B-504FB947D0ED.html

223

Since our scenario is to support Workload Management, we will replace the route-map with another one that allows Ingress and Egress CIDRs to/from the Tenants. The Ingress / Egress CIDR is carved out of block 192.168.105.0/24.

501

502

503

Second

Our Ingress / Egress CIDR pool for Tenants is 192.168.105.0/24, a /32 subnet is carved out of this for use in SNAT rules and Loadbalancer VIPs. Ultimately they are also advertised in BGP as /32 prefixes which end up in large number of entries on the upstream switches and routers which is unnecessary. Below is the screengrab from Leaf 1

601

Let’s summarize those routes at the T0 Gateway and confirm that only the Summary address is advertised.

602

603

Third

When using Active-Active ECMP enabled T0 Gateway (Edges) for Workload Management in vSphere, there are chances that we may face ingress/egress issues with pod networking. I had this issue where the pods were unable to pull images from external repositories eventhough K8S network policies were configured for Ingress/ Egress to/from pod networks. We might need to relax the URPF mode on the T0 Gateway Interfaces to fix this.

701

Fourth

BFD is not enabled in the BGP process by the VCF workflow. This needs to be enabled manually. Note that this need to be enabled both on the Leaf Switches and T0 Gateway.

216

This is the BFD Neighbor table from Leaf 1, it should see both the Edge nodes.

222

Fifth

Create a Default Static route on the T0 Gateway for internet access. Since we leverage Egress CIDR for this, make sure that this subnet (192.168.105.0/24 in our case) has necessary NAT mappings on the external firewall / gateways for internet access.

219

Sixth

Optionally, if we need ssh access to the Edges, enable it.

217

218

We are now set to consume the Edge Cluster for our workloads. In the next article, we will use the VCF 4.0 Workflow to enable vSphere 7 Workload Management on the VI WLD and consume the Edge Cluster that we built. Stay tuned.

I hope this article was informative.

Thanks for reading

Missed Part 1? Here it is:

Part 1 -> https://vxplanet.com/2020/04/25/nsx-t-3-0-edge-cluster-automated-deployment-and-architecture-in-vcf-4-0-part-1/

 

vxplanet

 

 

 

 

6 thoughts on “NSX-T 3.0 Edge Cluster Automated Deployment and Architecture in VCF 4.0 – Part 2

  1. Is there a reason you create a manual static route on the T0 as opposed to simply having the leaf switches (or something upstream) advertise it over BGP? Given the T0 Leaf ECMP options, it seems doing that over BGP would better handle failure conditions vs a static route over one of the specific uplink subnets.

  2. Thanks a lot for this tutorial part, it helped me fix the probleme of deploying Tanzu in my environment (I was getting the error of configurating cluster NIC on master VM

  3. Can you elaborate on the following statement you included under the Transport Zone section: “If the VI Workload domain is used for vSphere Workload Management usecase, make sure that we deploy only one edge cluster on this Overlay TZ”?

    I ask because if I deploy a T1 within a WLD, I wouldn’t want to share the T0 Edge Cluster with a T1 Gateway because of the impact it would have around N/S ECMP (which you’ve covered in a previous blog post). I would want to deploy a different Edge Cluster for the T1, yet your comment suggests not to do that.

    Any clarification you can provide would be greatly appreciated. Thank you!

    1. Hi Luis, actually one edge cluster per Overlay Transport zone was a limitation in VCF 4.0 to support Workload management (vSphere with Tanzu) . which was lifted in subsequent releases. Currently multiple Edge clusters / TZ are supported. Thanks

Leave a Reply