NSX-T Tier0 Inter-SR Routing Explained

Inter-SR Routing is an iBGP peering feature between the SR Components of the Tier 0 Gateway deployed in Active-Active mode. This feature helps to tolerate an asymmetric failure of the SR component on the Edge node by routing the internal traffic reaching to it over the iBGP inter-SR link to the SR component on the other Edge node for northbound reachability. Currently EBGP and static routes (NSX created as well as manual) are synced over the iBGP Inter-SR link. Note that this is only applicable for Active-Active T0 Gateway deployments. 

In this article, we will discuss how the T0 gateway Active-Active architecture looks like, Ingress/Egress traffic patterns, how to enable Inter-SR routing and how northbound failure scenarios are handled. Before we proceed, I would recommend reading my previous post on establishing eBGP peering between the T0 gateway and Dell Networking ToR swithes in the link below. This is because, I have used the same infra for this post and many of the statements in this article are connected to that.

https://vxplanet.com/2019/06/18/nsx-t-t0-active-active-gateway-and-bgp-peering-with-dell-networking-tor-switches/

Lets get started.

T0 Gateway Active-Active Architecture and Traffic flow

The below sketch shows the logical architecture of a T0 Gateway in an Active-Active deployment. It also explains how the Ingress and Egress traffic directions look like.

[Click here for HQ Image]

InterSRRouting3

This shows how the traffic direction is affected during an uplink failure of the T0 SR Component. We will discuss more about this towards the end of the article.

[Click here for HQ Image]

InterSRRouting2

Let’s discuss about the architecture. All the Logical segments attach to the DR Component of the Tier 1 Gateway. Depending on the Stateful services that are used, the T1 Gateway also have an SR component deployed. Note that the SR Components are not distributed in nature and they always sit on the Edge nodes. Currently only Active/Standby option is available for the T1 SR Components. The Standby SR Component is in an operationally down state and will take over only when the Active SR Component fails. The DR & SR components of the T1 gateway attach to each other using an NSX-T managed link on a 169.254.0.0/28 network (can be modified)

The T1 SR components attach to the DR Component of the T0 Gateway over a T0-T1 Transit link on 100.64.160.X/31 subnet (can be modified). In an Active-Active deployment (with ECMP), the T0 DR component will do load sharing of the traffic to both of the SR components of the T0 Gateway. This is achieved with the help of two default routes on the T0 DR Component pointing to each of the T0 SR components as the next-hop. The DR and SR components attach to each other using an NSX-T managed link on a 169.254.0.0/28 network (can be modified). In reality, there is a Transit Overlay logical segment created for the TO DR-SR connectivity and is completely transparent to the user.

On the T0 DR component, the routing entry looks like this:

  • One default route 0.0.0.0/0 pointing to 169.254.0.2/25 (SR1 on Edge node 1) with AD value 1
  • Second default route 0.0.0.0/0 pointing to 169.254.0.3/25 (SR2 on Edge node 2) with AD value 1

Load sharing is then achieved by the T0 DR routing process.

The T0 router has two Uplinks (on the SR components) via each of the Edge nodes. The SR component uplink connectivity looks like this:

  • SR1 has an Uplink on VLAN 60 over Edge node 1 connecting to Dell Networking L3 Leaf 1
  • SR2 has an Uplink on VLAN 70 over Edge node 2 connecting to Dell Networking L3 Leaf 2

Both SR Components establish eBGP peering on their respective VLANs with the Dell Networking L3 Leaf switches.

With Inter-SR Routing enabled on the BGP Process of the T0 Gateway, both the SR Components establish an internal SR link between each other over an NSX managed subnet 169.254.0.0/25. An iBGP peering is the established between them. We will discuss more on this later in this article.

Let’s have a look at the the Egress traffic pattern. Any traffic reaching on the T0 SR Component towards northbound direction will prefer the eBGP path to the Leaf switch rather than the iBGP link to the other SR component. This is because eBGP paths are more preferred over iBGP path in path selection. The Inter-SR link is used only when there is a failure scenario. For example, a failure of the uplink on SR2.

Now for the Ingress traffic also, the same rule applies. Leaf switches will forward traffic to the T0 SR Component to which they have an eBGP relationship, rather than using iBGP link to its VLT peer. 

Enabling Inter-SR Routing for the BGP Process

For more details on how to configure BGP peering of the T0 Gateway with the external L3 Leaf DellEMC Networking, please refer to my previous post below. I am not covering the BGP configuration in this post.

https://vxplanet.com/2019/06/18/nsx-t-t0-active-active-gateway-and-bgp-peering-with-dell-networking-tor-switches/

Let’s do a quick overview of how the configuration looks like:

We have a Tier 0 Gateway deployed in Active-Active mode with two uplinks – One on VLAN 60 via Edge node 1 and second on VLAN 70 via Edge node 2.

[Click here for HQ Image]

1

[Click here for HQ Image]

2

We have a Tier 1 Gateway attached to the T0 Gateway. This T1 Gateway has 3 Segments attached to it. The networks used on the Segments are 172.16.10.0/24, 172.16.20.0/24 and 172.16.30.0/14. They are advertised to the T0 Gateway (NSX-T managed).

[Click here for HQ Image]

3

BGP is enabled on the T0 Gateway and is peered with the L3 Leaf Switches over two VLANS – 60 & 70. 

[Click here for HQ Image]

5

This is where the Inter-SR Routing is enabled on the T0 Gateway. We are presented with just a Boolean Toggle button for this. NSX-T takes care of the configuration.

[Click here for HQ Image]

4

Now there are two SR Components created for the T0 Gateway sitting on each of the edge nodes. The T0 DR Component makes connection to both of them over an Overlay Transit Segment,as discussed earlier.

[Click here for HQ Image]

6

Route Redistribution is enabled on the T0 Gateway to advertise all the T1 Connected Segments in BGP.

8

We can see that the iBGP session is established between the SR Components. BFD is not used for this iBGP peering, instead the keepalives are lowered to 1 sec and holddown timer to 3 sec to quickly detect a failure on this link.

[Click here for HQ Image]

10

Let’s login to the Edge nodes and check the iBGP Inter-SR links that are created.

1617

Look at the iBGP Neighborship of SR components:

1819

Inter-SR iBGP Route Advertisement

Both the T0 SR Components exchange eBGP and static routes with each other with a Next-hop set to itself. All these advertised routes are set with a BGP community tag of NO_EXPORT. This means that any routes are are learned from the iBGP SR peer won’t be advertised to the external Leaf switches via eBGP relationship.

Let’s look at the routes received by each SR component from its iBGP inter-SR peer.

25b26b

 Let’s look at the advertised routes as well.

27b28b

Let’s look at the Community tagging in the iBGP relationship.

This is the running-configuration of the T0 SR Component. We can see that there is a Route-map named “autogenerated_rmap_for_inter_sr_peers_out” attached to the iBGP neighbor statement. This route map attaches the community tag of NO_EXPORT.

29b30

Let’s look at the community tagged routes. We can see that SR 2 on Edge node 2 advertises all the eBGP routes learned from the External Leaf switches and all the T1 Connected routes to its iBGP peer SR1 on Edge node 1 and vice-versa.

31

Just for additional information, if you want to view or export the running-configuration of the service router, you can enable debug mode and issue the command “get service router running-config

29

Egress – Ingress Traffic Flow direction before a T0 Uplink Failure

Lets revisit the architecture sketch at the beginning of the post. As mentioned, the T0 DR component will have two default routes that points to the SR components on both edges to achieve load sharing in Active-Active scenario. This is the forwarding table of the T0 DR Component.

33

Below is the Forwarding table of the SR components. We can see that each SR component will have two routes for the networks advertised from the external Leaf switches – One via eBGP neighborship with the Leaf switch and other via iBGP relationship with the other SR component. As per BGP path selection procedure, eBGP paths are preferred over iBGP paths. Hence the iBGP links act like a backup and comes into picture only when there is a failure scenario (Eg: Uplinks). For routes to the T1 gateway, each SR Component routes the traffic locally to the T1 SR component. A failure is not expected for these routes, but still the routes are also learned via iBGP. The path selection criteria is different here. Locally originated routes have more priority over iBGP learned routes.

The RED arrows shows the routes advertised from external Leaf switches. BLUE arrow shows the routes on the T1 Gateway

20b21b

Let’s look at the BGP table of the Dell EMC Networking Leaf Switches. Leaf 1 eBGP peers with SR 1 on Edge node 1 over VLAN 60. Leaf 2 eBGP peers with SR 2 on Edge node 2 over VLAN 70. The leaf switches are in VLT and are iBGP peers with each other.

Each leaf switch learns about the overlay network from 2 ways – one via eBGP relationship with the respective SR component on Edges and second via iBGP relationship with the VLR peer. As said before, eBGP routes are preferred over iBGP routes and hence Southbound traffic from the leaf switches are directly routed to the respective T0 SR components.

1112

Egress – Ingress Traffic Flow direction after a T0 Uplink Failure

The below sketch shows the traffic flow direction after invoking an Uplink failure. In this case, the Uplink of SR2 on Edge Node 2 is brought down.

[Click here for HQ Image]

InterSRRouting2

The SR Component with failed uplink will loose the eBGP peering wit the external Leaf switch. Any northbound traffic that reaches the SR component with failed uplink will be routed over the iBGP inter-SR link to the other SR component on the other Edge node where they are routed to external leaf swiches. This is the output from the SR node with failed Uplink.

23

24

Similarly for Ingress traffic, the leaf that looses the eBGP peering wit the SR component will choose the iBGP path to its VLT peer from where the traffic is routed to the respective SR component.

14

I hope this article was informative and gave you a good learning experience.

Thanks for reading

nsxrun

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s