
Welcome back!!!
We are at Part 3 of the blog series on NSX 4.0.1 Stateful Active-Active Gateways. In Part 1, we dealt with a single tier routing scenario with workload segments attached directly to a stateful A/A T0 gateway. In Part 2, we extended the topology to two tier, where the workload segments are attached to a stateful A/A T1 gateway up streamed to a stateful A/A T0 gateway.
If you missed the previous articles, please read them below:
Part 1 : https://vxplanet.com/2023/01/24/nsx-4-0-1-stateful-active-active-gateway-part-1-single-tier-routing/
Part 2: https://vxplanet.com/2023/01/30/nsx-4-0-1-stateful-active-active-gateway-part-2-two-tier-routing/
In this article, we will walk through the routing considerations and packet walks with different topologies for stateful active-active gateways. This might not be a complete reference for all north-south and east-west routing scenarios but will give a good understanding of how traffic is punted across edges for each scenario which helps you to plan topologies according to the requirements. We will also notice how northbound ECMP varies with respect to the topology used.
Let’s get started.
Scenario 1 : Northbound from a segment on Stateful A/A T0 gateway (South -> North)
For northbound traffic from a workload segment attached directly to a stateful A/A T0 gateway:
- T0 DR construct is available locally and lookup happens on the ESXi transport node
- We have T0 DR to T0 SR ECMP on the ESXi transport node. ECMP hashing forwards traffic to one edge node (eg: edge 4 in the sketch) irrespective of it’s sub-cluster membership.
- As traffic reaches the T0 SR – DR backplane interface on edge 4, a hash of the destination IP address is calculated and an edge node is selected again (eg: edge 2 in the sketch) based on the IP hash.
- This edge node (edge 2) will be responsible for all state information for the specific flow. Traffic will be punted from edge 4 to edge 2 from where traffic egresses over the T0 uplink interfaces.
[Click on image for hi-res]

Scenario 2 : Southbound to Segment on Stateful A/A T0 gateway (North -> South)
In scenario 2, we will deal with the return traffic for scenario 1. For southbound traffic (return traffic) to a workload segment attached directly to a stateful A/A T0 gateway:
- Southbound flow from the physical fabric will be ECMP hashed to one of the edge nodes. This is received by the T0 SR uplink interface of the edge node (eg: edge 3)
- As the flow is received by the T0 uplink interface on edge 3, a hash of the source IP address is calculated and as such, the same edge node which was originally chosen for the northbound flow is selected again (eg: edge 2 in the sketch) based on the source IP hash.
- This edge node (edge 2) will be responsible for all state information for the flow. Traffic will be punted from edge 3 to edge 2 from where southbound lookup happens.
- T0 DR lookup happens locally on the same edge node (edge 2) from where it is tunneled to the ESXi transport node to reach the workload VM.
[Click on image for hi-res]

Scenario 3 : Northbound from Segment on Stateful A/A T1 gateway attached to Stateful A/A T0 gateway (South -> North)
For this scenario (stateful A/A T1 gateway attached to stateful A/A T0 gateway ), we require a shared edge cluster to host the T1 and T0 SR constructs. Having dedicated edge clusters for T1 and T0 is currently not supported.
For northbound traffic from a segment attached to stateful A/A T1 gateway up streamed to stateful A/A T0 gateway:
- T1 DR construct is available locally and lookup happens on the ESXi transport node
- We have T1 DR to T1 SR ECMP on the ESXi transport node. ECMP hashing forwards traffic to one edge node (eg: edge 2 in the sketch) irrespective of it’s sub-cluster membership.
- As traffic reaches the T1 SR – DR backplane interface on edge 2, a hash of the destination IP address is calculated and an edge node is selected again (eg: edge 3 in the sketch) based on the IP hash.
- This edge node (edge 3) will be responsible for all state information for the specific flow. Traffic will be punted from edge 2 to edge 3 from where all northbound lookup happens. All northbound lookups are local to the edge node (edge 3) since a shared edge cluster is leveraged.
- As traffic reaches the T0 SR – DR backplane interface on edge 3, a hash of the destination IP address is calculated again, however the same edge node is selected and traffic stays local until it egresses over the T0 uplink interfaces.
[Click on image for hi-res]

Scenario 4 : Southbound to Segment on Stateful A/A T1 gateway attached to Stateful A/A T0 gateway (North -> South)
In scenario 4, we will deal with the return traffic for scenario 3.
For southbound traffic (return traffic) to a workload segment attached to stateful A/A T1 gateway up streamed to stateful A/A T0 gateway:
- Southbound flow from the physical fabric will be ECMP hashed to one of the edge nodes. This is received by the T0 SR uplink interface of the edge node (eg: edge 2)
- As the flow is received by the T0 uplink interface on edge 2, a hash of the source IP address is calculated and as such, the same edge node which was originally chosen for the northbound flow is selected again (eg: edge 3 in the sketch) based on the source IP hash.
- This edge node (edge 3) will be responsible for all state information for the flow. Traffic will be punted from edge 2 to edge 3 from where southbound lookup happens.
- T0 DR lookup happens locally on the same edge node (edge 3) since a shared edge cluster is leveraged. As the flow hits the T1 SR uplink interface, a hash of the source IP address is calculated again, however the same edge node is selected and traffic stays local where it is forwarded to T1 DR construct.
- Traffic is then tunneled to the ESXi transport node to reach the workload VM.
[Click on image for hi-res]

Scenario 5 : Northbound from Segment on DR-only T1 gateway attached to Stateful A/A T0 gateway
For northbound traffic from a segment attached to DR-only T1 gateway up streamed to stateful A/A T0 gateway:
- T1 DR and T0 DR constructs are available locally and lookup happens on the ESXi transport node
- We have T0 DR to T0 SR ECMP on the ESXi transport node. ECMP hashing forwards traffic to one edge node (eg: edge 4 in the sketch) irrespective of it’s sub-cluster membership.
- As traffic reaches the T0 SR – DR backplane interface on edge 4, a hash of the destination IP address is calculated, and an edge node is selected again (eg: edge 3 in the sketch) based on the IP hash.
- This edge node (edge 3) will be responsible for all state information for the specific flow. Traffic will be punted from edge 4 to edge 3 from where traffic egresses over the T0 uplink interfaces.
[Click on image for hi-res]

Scenario 6 : Southbound to Segment on DR-only T1 gateway attached to Stateful A/A T0 gateway
In scenario 6, we will deal with the return traffic for scenario 5. For southbound traffic (return traffic) to a workload segment attached to DR-only T1 gateway up streamed to stateful A/A T0 gateway:
- Southbound flow from the physical fabric will be ECMP hashed to one of the edge nodes. This is received by the T0 SR uplink interface of the edge node (eg: edge 2)
- As the flow is received by the T0 uplink interface on edge 2, a hash of the source IP address is calculated and as such, the same edge node which was originally chosen for the northbound flow is selected again (eg: edge 3 in the sketch) based on the source IP hash.
- This edge node (edge 3) will be responsible for all state information for the flow. Traffic will be punted from edge 2 to edge 3 from where southbound lookup happens.
- T0 DR and further T1 DR lookup happens locally on the same edge node (edge 3) from where it is tunneled to the ESXi transport node to reach the workload VM.
[Click on image for hi-res]

Scenario 7 : East – West between segments on Stateful A/A T1 gateways – 1
Now let’s take a look at East-West flows between segments attached to stateful A/A T1 gateways up streamed to stateful A/A T0 gateway. In this scenario 7, we have traffic originating from a segment in T1-Tenant01-DevApps (left) to a destination segment in T1-Tenant02-StgApps (right)
Note that this is a shared edge cluster for both stateful A/A T1 gateways and stateful A/A T0 gateway.
- For the northbound traffic originating from segment “LS-DevApps01” on T1- Tenant01-DevApps, the T1 DR construct is available locally and lookup happens on the ESXi transport node
- We have T1 DR to T1 SR ECMP on the ESXi transport node. ECMP hashing forwards traffic to one edge node (eg: edge 4 in the sketch) irrespective of it’s sub-cluster membership.
- As traffic reaches the T1 SR – DR backplane interface on edge 4, a hash of the destination IP address is calculated and an edge node is selected again (eg: edge 2 in the sketch) based on the IP hash.
- This edge node (edge 2) will be responsible for all state information for the specific flow for the T1 gateway “T1- Tenant01-DevApps”. Traffic will be punted from edge 4 to edge 2 from where further northbound lookup happens.
- T0 DR lookup happens locally on the edge node, edge 2.
- Further lookup for the T1 SR of the destination gateway “T1-Tenant02-StgApps” also happens locally on edge 2.
- As traffic reaches the destination T1 SR uplink interface, this is treated as incoming traffic (southbound), a hash of the source IP address is calculated, and an edge node is selected to hold stateful information for the flow. In the sketch, this is edge 4.
- This edge node (edge 4) will be responsible for all state information for the specific flow for the T1 gateway “T1- Tenant02-StgApps”. Traffic will be punted from edge 2 to edge 4 from where further southbound lookup happens.
- T1 DR lookup happens locally on edge 4 from where it is tunneled to the ESXi transport node to reach the workload VM.
[Click on image for hi-res]

Scenario 8 : East – West between segments on Stateful A/A T1 gateways – 2
In this scenario 8, we will deal with the return traffic for scenario 7. We have traffic originating from a segment in T1-Tenant02-StgApps (right) to a destination segment in T1-Tenant01-DevApps (left).
Note that this is a shared edge cluster for both stateful A/A T1 gateways and stateful A/A T0 gateway.
- For the northbound traffic originating from segment “LS-StgApps01” on T1-Tenant02-StgApps, the T1 DR construct is available locally and lookup happens on the ESXi transport node
- We have T1 DR to T1 SR ECMP on the ESXi transport node. ECMP hashing forwards traffic to one edge node (eg: edge 1 in the sketch) irrespective of it’s sub-cluster membership.
- As traffic reaches the T1 SR – DR backplane interface on edge 1, a hash of the destination IP address is calculated and an edge node is selected again (eg: edge 4 in the sketch) based on the IP hash.
- This edge node (edge 4) will be responsible for all state information for the specific flow for the T1 gateway “T1-Tenant02-StgApps”. Note that this is the same edge node selected for the forward traffic in scenario 7. Traffic will be punted from edge 1 to edge 4 from where further northbound lookup happens.
- T0 DR lookup happens locally on the edge node, edge 4.
- Further lookup for the T1 SR of the destination gateway “T1-Tenant01-DevApps” also happens locally on edge 4.
- As traffic reaches the destination T1 SR uplink interface, this is treated as incoming traffic (southbound), a hash of the source IP address is calculated, and an edge node is selected to hold stateful information for the flow. In the sketch, this is edge 2.
- This edge node (edge 2) will be responsible for all state information for the specific flow for the T1 gateway “T1-Tenant01-DevApps”. Note that this is the same edge node selected for the forward traffic in scenario 7. Traffic will be punted from edge 4 to edge 2 from where further southbound lookup happens.
- T1 DR lookup happens locally on edge 2 from where it is tunneled to the ESXi transport node to reach the workload VM.
[Click on image for hi-res]

Scenario 9 : Northbound from Segment on Legacy A/S T1 gateway attached to Stateful A/A T0 gateway (South -> North)
Scenario 9 deals with attachment of active-standby T1 gateway to a stateful active-active T0 gateway. This requires instantiating T0 and T1 gateways on separate edge clusters (currently a requirement).
For northbound traffic from a segment attached to active-standby T1 gateway up streamed to stateful A/A T0 gateway:
- T1 DR construct is available locally and lookup happens on the ESXi transport node
- Traffic is tunneled to the edge node hosting the active SR construct of the T1 gateway (edge 1).
- We have T1 SR to T0 DR ECMP available on edge 1 as T1 and T0 gateways are instantiated on separate edge clusters. Traffic is tunneled northbound to one of the edge nodes in the T0 edge cluster as per ECMP hash algorithm. In the sketch, it is edge 3.
- As traffic reaches the T0 SR – DR backplane interface on edge 3, a hash of the destination IP address is calculated, and an edge node is selected (eg: edge 5 in the sketch) to store stateful information for the flow.
- This edge node (edge 5) will be responsible for all state information for the flow. Traffic will be punted from edge 3 to edge 5 from where traffic egresses over the T0 uplink interfaces.
[Click on image for hi-res]

Scenario 10 : Southbound to Segment on Legacy A/S T1 gateway attached to Stateful A/A T0 gateway (North -> South)
In this scenario 10, we will deal with the return traffic for scenario 9.
For southbound traffic (return traffic) to a workload segment attached to A/S T1 gateway up streamed to stateful A/A T0 gateway:
- Southbound flow from the physical fabric will be ECMP hashed to one of the edge nodes. This is received by the T0 SR uplink interface of the edge node (eg: edge 4)
- As the flow is received by the T0 uplink interface on edge 4, a hash of the source IP address is calculated and as such, the same edge node which was originally chosen for the northbound flow is selected again (eg: edge 5 in the sketch) based on the source IP hash.
- This edge node (edge 5) will be responsible for all state information for the flow. Traffic will be punted from edge 4 to edge 5 from where southbound lookup happens.
- T0 DR lookup happens locally on the same edge node (edge 5).
- Traffic is tunneled to the edge node (on separate edge cluster) hosting the active SR construct of the A/S T1 gateway from where further southbound lookup happens. In the sketch it is edge 1.
- T1 DR lookup happens locally on edge 1. Traffic is then tunneled to the ESXi transport node to reach the workload VM.
[Click on image for hi-res]

Now it’s time to break. This has been a lengthier one 😊
We will meet in Part 4 next, where we deal with edge sub-clusters and failure domains.
I hope the article was informative.
Thanks for reading.
Continue reading? Here are the other parts of this series:
Part 1 : https://vxplanet.com/2023/01/24/nsx-4-0-1-stateful-active-active-gateway-part-1-single-tier-routing/
Part 2: https://vxplanet.com/2023/01/30/nsx-4-0-1-stateful-active-active-gateway-part-2-two-tier-routing/
