Inconsistent Networking Behaviour between Tanzu Kubernetes Clusters

When working with vSphere with Tanzu on NSX-T, the cluster-to-cluster communication between Tanzu Kubernetes Clusters on the same vSphere Workload Management Cluster was observed to deviate from other combinations of communication.

This means that we cannot use the Tanzu Supervisor Namespace Egress IP as the source IP when we create Gateway Firewall rules to allow cluster-to-cluster communication.

This is due to the default NAT rules that NSX-T creates when a new Supervisor Namespace is created:

NSX-T NAT Rules

A quick look at the 3 NAT rules set:

No SNAT for traffic going from Node IPs (10.222.0.0/16) to Ingress IPs (10.223.112.0/20)
- No SNAT is applied to any traffic going from Cluster A to Cluster B
No SNAT for traffic going from Node IPs (10.222.0.0/16) to Node IPs (10.222.0.0/16)
- No SNAT is applied to any traffic going from Node A to Node B
SNAT for all other traffic, translated to the Tanzu Supervisor Namespace Egress IP (10.223.136.18)
- SNAT is applied to any other traffic, such as from Cluster A to another Virtual Machine in another segment, and egress the Tanzu Supervisor Namespace segment through the Tanzu Supervisor Namespace Egress IP

What does this mean?

Rule 1:

Rule 1 illustration

For Cluster-to-Cluster traffic, SNAT rule 1 is applied, Cluster B will see the incoming traffic IP address as the Node IP of Cluster A

Rule 2:

Rule 2 illustration

For traffic within the Cluster, SNAT rule 2 is applied, Nodes within Cluster A will see each other's IP address. This is working as intended.

Rule 3:

Rule 3 illustration

For Cluster-to-Virtual-Machine traffic, SNAT rule 3 is applied, the Virtual Machine will see the incoming traffic IP address as the Tanzu Supervisor Namespace Egress IP which Cluster A resides in. This is working as intended.

Rule 3 illustration

In fact, for network traffic that leaves the T0 Gateway to an external Kubernetes cluster, the external Kubernetes cluster will also see the incoming traffic IP address as the Tanzu Supervisor Namespace Egress IP which Cluster A resides in. This is also working as intended.

Hypothesis

Because of the implementation of vSphere Pods, where the Tanzu Supervisor Namespaces are treated as Kubernetes namespaces, and vSphere Pods are treated as Kubernetes pods, the network communication between Kubernetes pods in different namespaces should show the internal IP addresses and not the Tanzu Supervisor Namespace egress IP. In this implementation, SNAT Rule 1 is valid.

The problem happens when SNAT Rule 1 is retained in NSX-T even after the Tanzu Kubernetes Cluster deployment model was selected, where the Tanzu Kubernetes Cluster nodes should not exist in the same segment as other Tanzu Kubernetes Cluster nodes when they exist in different Tanzu Supervisor Namespaces.

Workaround

Since we cannot use the Tanzu Supervisor Namespace Egress IP as the source, we need to find a way to obtain the IP address of the Cluster Nodes.

If we do some exploring, we observe there are some default NSX-T Distributed Firewall rules that are created whenever a new Supervisor Namespace is created to allow intra-cluster communication by default.

Auto-generated security group in NSX-T Distributed Firewall for Supervisor Namespace

These rules are applied to 2 security groups, by clicking the security groups, we can see that one of the security groups contain all Cluster Nodes that exist within the Tanzu Supervisor Namespace.

Members in auto-generated security group

We can then select this security group as our source when creating Gateway Firewall rules, and the network communication will work as desired.

The downside is this solution does not have the ability to set fine-grained rules for Tanzu Kubernetes Clusters that exist in the same Tanzu Supervisor Namespace, as the security group contains everything.

This means for Cluster A and Cluster B, both existing in the same Tanzu Supervisor Namespace, we cannot allow traffic from Cluster A but deny traffic from Cluster B.