Repost on medium : https://medium.com/@allamand/using-aws-nlb-manually-targeting-an-eks-service-exposing-udp-traffic-17053ecd8f52
If we try to create a service of type Network Load Balancer (NLB) for UDP traffic we can get this error :
This is because the UDP support for NLB is more recent than the functionality developed inside kubernetes for creating NLB load balancers.
The bug is being report in this issue : #79523 and is currently investigated by AWS.
We are going to work arround this actual limitation
Meanwhile we can manually configure an NLB to point to our EKS instances, and configure a Kubernetes NodePort service instead of LoadBalancer.
In NodePort mode, every Instance will listen on a pre-defined port (range between 30000-32767) on each EC2 instances and Kubernetes will forward the traffic to the associate Kubernetes pods:
As an example, we are going to expose the Kubernetes core-dns pods through a manually created NLB. We choose core-dns, that is expose an UDP service on port 53.
Those pods have label k8s-app: kube-dns
so we can create a new service of type NodePort:
Let's apply the service definition to the EKS cluster.
We have defined an UDP port 30053 that will be listening on each Instances, and forward to the kube-dns pods on port 53.
We have also defined a TCP port 30054 that will be used by the NLB when configuring the healthcheck port.
This service must be deployed in kube-system namespace as the coredns-pods are deployed there.
At this stage, we have created the Kubernetes service, which is already listening on each instances. Before creating the NLB, we can check that this is working inside of the Kubernetes cluster:
We can uses a specific kubernetes pod eksutils-pod that may be useful to debug things in kubernetes.
Create the eksutils pod in the default namespace:
Then we can connect inside the pod, and see if we are able to make dns resolution based on the internal services.
In the below example, For each of the Kubernetes nodes I request the grafana service in the metrics namespace. Change this to point to existing services you have in your cluster. It also check the conn ection on the TCP port 30054
This will make a dns lookup for a grafana service in metrics namespace, on each nodes on port 30053 UDP, and for each port try to connect on port 30054 TCP.
Get the subnets of your instances in the same vpc you have for your EKS cluster
Note: this is ok only if you have a dedicated vpc for your EKS cluster. If this is not the case, you need to filter to match only the instances you want.
First create the NLB using the subnets of your worker nodes
We want to add every nodes which are parte of our EKS cluster as target for our NLB target-group. Get the list of instances
Register the instances :
Check listener
Check Health of the targets
Should be uneahlthy until we configure the SG of the instances.
In order to allow the health check, we need to allow the port 30054 in the Security Groups of our instances to be reach by the IP of the NLB
Get security group from instances IDs for all instances
Add Rule to the Security Group
Instead of oppening from 192.168.0.0/16 you can open for the real NLB Ip adresses using
Once the targets are healthy, you can test the access. find the URL of your NLB and test it's access (be sure you have internet access for port 53)
Test our newly expose domain name server with dig:
We can see here that we are able to request our exposed DNS server.
If you add some EKS instances, you'll need to add thoses instances in your NLB target group in order to be able to spread the load on thoses instances too.
When Scaling down, the old instances will become unhealthy, and they'll automagically be deregistered and disapears from the target groups, so there will be no specific action to take in this case.
We want to find a way to automate the addon on new targets in the NLB when we are doing a scale out.
We can configure Amazon EC2 Auto Scaling to send events to CloudWatch Events whenever our Auto Scaling group scales.
The Idea here is that each time an instance is created by our auto scaling groups of our EKS cluster, then the instances are automatically added to the NLB target group.
Create Lambda function which is going to add the instance added via the AutoScaling Group to the NLB Target Group
You need to parameterize LbName to the name of the NLB you want to add the instance into
Create the Role for our Lambda function
attach policy to the role
Create the Lambda function (I've created a zip with the previous lambda code)
test the function with the json test
You should have a normal error saying that:
Adapt the following Event pattern rule with the name of the AutoScaling Groups you need. In my case, I have 3 auto scaling group associated to my eks cluster.
Create the CloudWatch rule
Get Lambda arn
Create Cloudwatch Event rule target
Add the target to the event rule
Add permission on the Lambda to be triggered by the Event
We Create a CloudWatch Rule in order to associate our AutoScaling Group with our Lambda Function.
Select Auto Scaling as event Patter, and select all your autoscaling groups you want to monitor
For the target, simply choose the Lambda function we just create.
You can now test Scaling out or scaling in your ASG, or simply terminate some instances, the ASG will send an event to the lambda each time a new instance is created in the acording ASGs, and the lambda register the new instance in the NLB target groups.
Here I will just show you how you can add instances to the NLB manually using the CLI. Normally you won't need to do this instead perhaps the first time, if you created the instances before our previous automatic setup.
Retrieve the list of instances in your VPC you want to add to the Load balancer..
..Or we can get instances from the Kubernetes API :
Manually add NLB targets
This will add every instance in from the $INSTANCES var as target group for the NLB kube-dns-nlb
Because the service is in NodePort mode, then every instance is able to listen on this port and forward it to the targeted service, in this case the kube-dns service targeting itself the coredns pods.
When upgrading the version of a nodegroup, EKS will start a rolling-upgrade of each instance. this will cause each old instance to disapear from the NLB when it is deleted, and new ones will automatically be added by our Cloudwatch Event + Lambda function
Deleting the target Group
Deleting the Load Balancer