NSX-T Edge Nodes come in two form factors – VM and Baremetal both leveraging DPDK (Data Plane Development Kit) acceleration for faster packet processing. Deciding which form factor to use depends upon on our use case requirements and it is good to understand the workload traffic behavior and centralized services requirement before finalizing the Edge deployment form factor. This is because different formfactors have got different upper limits with Baremetal Edges having the highest.
If you would like to see a comparison between baremetal Edges and VM Edges, then visit my earlier post below:
https://vxplanet.com/2019/06/13/nsx-t-edges-baremetal-vs-vm-comparison/
I recently needed to resize an Edge VM cluster to support an Enterprise PKS deployment as it required Edge nodes to be in Large size. However the already deployed Edge VM Cluster was on Medium size which was not supported. Hence I thought to blog this as a reference for anyone planning for Resizing.
The Resizing procedure can be realized like a Migration. We will deploy new edge nodes with the desired size and then migrate stateful services to it.
In this article, we will look at the different sizing for Edge VMs along with the procedure to resize an Edge node VM post deployment.
A note before proceeding
- This procedure can be followed only when the new Edge nodes have the same Transport zone and NVDS Configuration as the Edge nodes to be replaced.
- This can also be followed to migrate to Baremetal Edges, but again the transport zones and NVDS Configuration should be identical.
- This can’t be used for Edge node Topology changes – For eg: migration from a Single TEP Multi-NVDS Edges to Single-NVDS Multi-TEP Edges using this procedure is not possible.
- We can have Edges with different form factors inside an Edge cluster, but recommended to have symmetric form factor.
Introduction
Edge node VMs are available in 3 sizes – Small, Medium and Large. For Production deployments, we would require atleast a Medium size but recommended to have a Large Size. But the choice depends upon the number and type of Stateful services that we are going to deploy on the T1 / T0 Gateways. For eg: If our workloads require a ‘Large’ Loadbalancer, then we would need a ‘Large’ Edge VM. Make sure to go through the Configuration Max page to understand the Upper limits before deploying the Edge VMs.
https://configmax.vmware.com/repcomp/compare
Each Edge VM sizing has separate Compute requirements as mentioned below:
Current Environment
I had a number of Edge clusters for different environments, but the highlighted Edge nodes are the ones we will use to demo this blog post.
Both Edge VMs (BGGWEdge01 & BGGWEdge02) are in Medium size and are a part of Edge Cluster BGGWCluster01 configured for 3 Transport Zones:
- TZ_PKS_Overlay (Overlay Transport Zone)
- TZ_Edges_Uplink1 (VLAN TZ for Edge Uplinks)
- TZ_Edges_Uplink2 (VLAN TZ for Edge Uplinks
Edge VM has 3 NVDS, each configured on separate Transport zones above.
Let’s check the Cluster Keepalive status from both Edge nodes to make sure that BFD Keepalives are exchanged between them over the TEP interface as well as Management interface.
The environment uses a 2-tier topology with Stateful services deployed.
- Tier0 Gateway (LR_T0_BG01) deployed in Active-Active mode with 4 uplinks levering this Edge Cluster.
- Each Edge node hosts a T0 SR Construct with two External uplinks – one on VLAN 60 and the other on VLAN 70
- The T0 Gateway establishes eBGP peering with two Dell EMC S5048-ON Leaf switches. We will have 4 eBGP peerings in total (2 on each T0 SR Construct)
- A Tier 1 Gateway (LR_T1_BGIntranet) with an SR Construct and uplinked to the Tier 0 Gateway
- Another Tier 1 Gateway (LR_T1_BG_Dev) with an SR Construct and uplinked to the Tier 0 Gateway
- A ‘Small’ Loadbalancer deployed on the Tier 1 Gateway (LR_T1_BG_Dev) which is front end to Web apps on a Logical Segment attached to T1 Gateway (LR_T1_BG_Dev)
This means that Each Edge node will have the below SR Constructs:
- SR for Tier 0 Gateway (LR_T0_BG01)
- SR for Tier 1 Gateway (LR_T1_BGIntranet)
- SR for Tier 1 Gateway (LR_T1_BG_Dev)
Pre-emption is enabled for the Tier 1 SR Constructs, so that after the Edge node migration process, the Active-Standby SR Construct placement is retained on the newer Edge nodes. The DR Constructs for the Gateways will always be available across all the Transport nodes in the Transport zone.
This is how the SR Placements on Edge nodes look like:
And this is as shown below:
This is the T0 Uplinks over the two Edge nodes.
This is the eBGP neighborship established by each T0 SR Construct with separate Leaf Switches. This output is from BGGWEdge01.
This is the Loadbalancer deployed on Tier 1 Gateway ‘LR_T1_BG_Dev’
Deploying the Large Edge Appliances
The new Edge appliances are deployed in ‘Large’ size and configured on the same Transport zones as the previous edge nodes (to be replaced). This is because, for the below procedure to work, the new and old Edge VMs should have the same NVDS Configuration applied.
The new Edge nodes are named BGGWEdge05_Large & BGGWEdge06_Large.
Once both Large Edge VMs are deployed, confirm that they are in healthy state. DO NOT add them to Edge Cluster.
Migrating the Medium Edge Nodes to Large Form Factor
First Edge node, BGGWEdge01
We will take one of the Edge nodes in the Edge cluster to maintenance mode. This will shutdown the Dataplane services on the Edge node and trigger a failover for any Active SR Constructs on the Edge node. This induces a brief interruption in the traffic flow for the respective Tier 1 Gateway. Lets put BGGWEdge01 to maintenance mode.
We will now see all the Active SR Constructs are now on the Edge node BGGWEdge02.
All services are reachable via the edge node BGGWEdge02. Let’s test the reachability of the Loadbalancer VIP.
Now under the Edge Cluster Actions, use the ‘Replace Edge Cluster Member‘ to migrate the services on Edge node BGGWEdge01 to Large Edge node BGGWEdge05_Large.
The new Edge node BGGWEdge05_Large should now replace the Edge node BGGWEdge01 and become a part of the Edge Cluster BGGWCluster01.
It should now have all the DR & SR Constructs of T0 and T1 Gateways.
It should now establish Geneve Tunnel with it’s cluster member BGGWEdge02. Let’s confirm that BFD Keepalives are exchanged between them over the TEP interface as well as Management interface.
Let’s confirm that the new Edge node is available for the T1 & T0 Gateways and that the Active – Standby placement for the SR Constructs is retained as par the replaced edge node.
Let’s confirm that the new Edge node is reflected under the T0 Uplink Interfaces.
The new Edge node should now re-establish eBGP sessions with the Leaf Switches.
Now let’s do a Traceflow from an Uplink interface on the new Edge node BGGWEdge05_Large to a VM attached to the T1 Gateway ‘LR_T1_BG_Dev’. We could see that the traffic crosses the necessary DR & SR Constructs on the Edge before being tunneled to the Compute Transport node. This confirms the inter-tier connectivity.
Migrating the Second Edge node, BGGWEdge02 to Large Form factor
The same procedure as described above can be followed.
Once the migration is completed, we should see the same Edge cluster with the two large Edge nodes. The cluster is now available to provide additional capacity for the stateful services now.
Final Tasks and Cleanup
At this moment, the old Edge nodes are not a part of any Edge cluster. We should not see any DR-SR Constructs in it.
The old Edge nodes can either be re-purposed as a new Edge cluster or cleaned up from NSX-T Manager / vCenter.
I hope the article was informative. Thanks for reading