Resizing NSX-T Edge node VMs


NSX-T Edge Nodes come in two form factors – VM and Baremetal both leveraging DPDK (Data Plane Development Kit) acceleration for faster packet processing. Deciding which form factor to use depends upon on our use case requirements and it is good to understand the workload traffic behavior and centralized services requirement before finalizing the Edge deployment form factor. This is because different formfactors have got different upper limits with Baremetal Edges having the highest.

If you would like to see a comparison between baremetal Edges and VM Edges, then visit my earlier post below:

https://vxplanet.com/2019/06/13/nsx-t-edges-baremetal-vs-vm-comparison/

I recently needed to resize an Edge VM cluster to support an Enterprise PKS deployment as it required Edge nodes to be in Large size. However the already deployed Edge VM Cluster was on Medium size which was not supported. Hence I thought to blog this as a reference for anyone planning for Resizing.

The Resizing procedure can be realized like a Migration. We will deploy new edge nodes with the desired size and then migrate stateful services to it.

In this article, we will look at the different sizing for Edge VMs along with the procedure to resize an Edge node VM post deployment.

A note before proceeding

  • This procedure can be followed only when the new Edge nodes have the same Transport zone and NVDS Configuration as the Edge nodes to be replaced.
  • This can also be followed to migrate to Baremetal Edges, but again the transport zones and NVDS Configuration should be identical.
  • This can’t be used for Edge node Topology changes – For eg: migration from a Single TEP Multi-NVDS Edges to Single-NVDS Multi-TEP Edges using this procedure is not possible.
  • We can have Edges with different form factors inside an Edge cluster, but recommended to have symmetric form factor.

Introduction

Edge node VMs are available in 3 sizes – Small, Medium and Large. For Production deployments, we would require atleast a Medium size but recommended to have a Large Size. But the choice depends upon the number and type of Stateful services that we are going to deploy on the T1 / T0 Gateways. For eg: If our workloads require a ‘Large’ Loadbalancer, then we would need a ‘Large’ Edge VM. Make sure to go through the Configuration Max page to understand the Upper limits before deploying the Edge VMs.

https://configmax.vmware.com/repcomp/compare

Each Edge VM sizing has separate Compute requirements as mentioned below:

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.5/installation/GUID-22F87CA8-01A9-4F2E-B7DB-9350CA60EA4E.html

1

Current Environment

I had a number of Edge clusters for different environments, but the highlighted Edge nodes are the ones we will use to demo this blog post.

[Click here for HQ Image]

100.png

Both Edge VMs (BGGWEdge01 & BGGWEdge02) are in Medium size and are a part of Edge Cluster BGGWCluster01 configured for 3 Transport Zones:

  • TZ_PKS_Overlay (Overlay Transport Zone)
  • TZ_Edges_Uplink1 (VLAN TZ for Edge Uplinks)
  • TZ_Edges_Uplink2 (VLAN TZ for Edge Uplinks

Edge VM has 3 NVDS, each configured on separate Transport zones above.

101

Let’s check the Cluster Keepalive status from both Edge nodes to make sure that BFD Keepalives are exchanged between them over the TEP interface as well as Management interface.

102103

The environment uses a 2-tier topology with Stateful services deployed.

  • Tier0 Gateway (LR_T0_BG01) deployed in Active-Active mode with 4 uplinks levering this Edge Cluster.
  • Each Edge node hosts a T0 SR Construct with two External uplinks – one on VLAN 60 and the other on VLAN 70
  • The T0 Gateway establishes eBGP peering with two Dell EMC S5048-ON Leaf switches. We will have 4 eBGP peerings in total (2 on each T0 SR Construct)
  • A Tier 1 Gateway (LR_T1_BGIntranet) with an SR Construct and uplinked to the Tier 0 Gateway
  • Another Tier 1 Gateway (LR_T1_BG_Dev) with an SR Construct and uplinked to the Tier 0 Gateway
  • A ‘Small’ Loadbalancer deployed on the Tier 1 Gateway (LR_T1_BG_Dev) which is front end to Web apps on a Logical Segment attached to T1 Gateway  (LR_T1_BG_Dev)

This means that Each Edge node will have the below SR Constructs:

  • SR for Tier 0 Gateway (LR_T0_BG01)
  • SR for Tier 1 Gateway (LR_T1_BGIntranet)
  • SR for Tier 1 Gateway (LR_T1_BG_Dev)

Pre-emption is enabled for the Tier 1 SR Constructs, so that after the Edge node migration process, the Active-Standby SR Construct placement is retained on the newer Edge nodes. The DR Constructs for the Gateways will always be available across all the Transport nodes in the Transport zone.

104105

This is how the SR Placements on Edge nodes look like:

2.png

And this is as shown below:

106107108

This is the T0 Uplinks over the two Edge nodes.

115.png

This is the eBGP neighborship established by each T0 SR Construct with separate Leaf Switches. This output is from BGGWEdge01.

109

This is the Loadbalancer deployed on Tier 1 Gateway ‘LR_T1_BG_Dev’

111.png

Deploying the Large Edge Appliances

The new Edge appliances are deployed in ‘Large’ size and configured on the same Transport zones as the previous edge nodes (to be replaced). This is because, for the below procedure to work, the new and old Edge VMs should have the same NVDS Configuration applied.

The new Edge nodes are named BGGWEdge05_Large & BGGWEdge06_Large.

112

113

Once both Large Edge VMs are deployed, confirm that they are in healthy state. DO NOT add them to Edge Cluster.

[Click here for HQ Image]

114

Migrating the Medium Edge Nodes to Large Form Factor

First Edge node, BGGWEdge01

We will take one of the Edge nodes in the Edge cluster to maintenance mode. This will shutdown the Dataplane services on the Edge node and trigger a failover for any Active SR Constructs on the Edge node. This induces a brief interruption in the traffic flow for the respective Tier 1 Gateway. Lets put BGGWEdge01 to maintenance mode. 

116 

We will now see all the Active SR Constructs are now on the Edge node BGGWEdge02. 

117

118

119

All services are reachable via the edge node BGGWEdge02. Let’s test the reachability of the Loadbalancer VIP.

120.png

Now under the Edge Cluster Actions, use the ‘Replace Edge Cluster Member‘ to migrate the services on Edge node BGGWEdge01 to Large Edge node BGGWEdge05_Large.

121

122

The new Edge node BGGWEdge05_Large should now replace the Edge node BGGWEdge01 and become a part of the Edge Cluster BGGWCluster01.

123.png

It should now have all the DR & SR Constructs of T0 and T1 Gateways.

124

It should now establish Geneve Tunnel with it’s cluster member BGGWEdge02. Let’s confirm that BFD Keepalives are exchanged between them over the TEP interface as well as Management interface.

125.png

Let’s confirm that the new Edge node is available for the T1 & T0 Gateways and that the Active – Standby placement for the SR Constructs is retained as par the replaced edge node.

126

127

128

Let’s confirm that the new Edge node is reflected under the T0 Uplink Interfaces.

129.png

The new Edge node should now re-establish eBGP sessions with the Leaf Switches.

131.png

Now let’s do a Traceflow from an Uplink interface on the new Edge node BGGWEdge05_Large to a VM attached to the T1 Gateway ‘LR_T1_BG_Dev’. We could see that the traffic crosses the necessary DR & SR Constructs on the Edge before being tunneled to the Compute Transport node. This confirms the inter-tier connectivity.

130.png

Migrating the Second Edge node, BGGWEdge02 to Large Form factor

The same procedure as described above can be followed. 

134.png

Once the migration is completed, we should see the same Edge cluster with the two large Edge nodes. The cluster is now available to provide additional capacity for the stateful services now.

135.png

Final Tasks and Cleanup

At this moment, the old Edge nodes are not a part of any Edge cluster. We should not see any DR-SR Constructs in it.

133.png

The old Edge nodes can either be re-purposed as a new Edge cluster or cleaned up from NSX-T Manager / vCenter. 

I hope the article was informative. Thanks for reading

 

vxplanet

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s