BGP Considerations in NSX-T Integrated VMware Enterprise PKS

VMware Enterprise PKS supports two deployment models with respect to T0 and T1 Gateways since NSX-T version 2.5.

Active-Passive T0 Gateway with a dedicated T1 Gateway per Kubernetes namespace.
Active-Active T0 Gateway with a dedicated T1 Gateway per Kubernetes cluster (NSX-T v2.5 onwards)

In the first deployment model, a T1 Gateway is created for every namespace that is created in the Kubernetes cluster. Each namespace gets a unique network subnet which is carved out of a POD block defined in NSX-T manager. All the stateful services in the deployment (like SNAT/DNAT rules for the kubernetes PODs) are created in the Tier 0 gateway and hence they are deployed in an Active-Standby mode. With BGP configured as the routing protocol, we get only a single northbound path because ECMP can’t be leveraged in an Active-Standby T0 deployment. Also, as the Kubernetes clusters grows, the number of T1 objects created in NSX-T manager also grows which makes it look messy.

The second model is a simplified deployment option available since NSX-T v2.5 which uses a shared Tier 1 Gateway model. Here a dedicated Tier 1 Gateway is assigned to a Kubernetes cluster (rather than to a namespace) and all stateful services for that specific Kubernetes cluster is pushed down from T0 to it’s dedicated T1. This gives us the advantage and flexibility to have T0 Gateways deployed in Active-Active mode and leverage ECMP for North-South routing.

In both deployment models, we use BGP (or static if necessary) to advertise the NSX-T networks (Kubernetes node network, POD network, management network and other floating networks) based on the adopted topologies. To know mode about the supported topologies associated with the above said deployment models, have a look at the Pivotal reference below:

https://docs.pivotal.io/pks/1-6/nsxt-topologies.html

The PKS subnets are defined in NSX-T prior to deployment. Below are the subnets along with a description that I am using for this article.

Node Network – This is the network will be used by the Kubernetes nodes which are deployed by PKS BOSH Director. We will define a /16 pool here and PKS will carve a /24 block out of it and NSX-T manager will associate this to a dedicated Tier 1 Segment. The node network used in this setup is 172.31.0.0/16. This is a routable subnet.

POD Network – This is the network that will be used by each Kubernetes namespace. We will define a /16 pool here and PKS will carve a /24 block out of it and NSX-T manager will associate this to a dedicated Tier 1 Segment. All PODs on the same namespace attach to the same Logical Segment. The POD network used in this setup is 172.30.0.0/16. This is a non-routable subnet. So a NAT instance is needed for for the PODs for external access.

Floating Pool Network – This is a routable block that will be used for SNAT instances and Loadbalancer VIPs. We will define a /24 block and NSX-T will carve a /32 IP out of it. This is used whenever a Loadbalancer instance is required for Kubernetes (like LB for KubeAPI, Ingress Controller etc) as well as for SNAT instances for POD networks. The floating pool network used in this setup is 192.168.105.0/24

PKS Management Network – This is the management network to host PKS management and Control plane VMs. This sits on a dedicated T1 instance attached to the T0 Gateway. The management network is 192.168.101.0/24 and is manually created prior to PKS deployment

This is the architecture of the Active-Standby T0 Gateway PKS deployment model with dedicated T1 Gateways per Kubernetes namespace.

Observations from BGP advertisement

Few Routing Decisions

Enterprise PKS Architecture (Active-Standby T0 Gateway)

PKS Deployed NSX-T Objects and Subnets

A look at the BGP Table on Leaf Switches

BGP Route Summarization and Filtering

Share this:

Like this:

Related

Published by HariKrishnan

Leave a ReplyCancel reply

Discover more from VxPlanet

Discover more from VxPlanet