Using AWS NLB manually targeting an EKS Service exposing UDP traffic

--- title: Using AWS NLB manually targeting an EKS Service exposing UDP traffic tags: AWS, EKS, NLB, UDP breaks: false --- # Using AWS NLB manually targeting an EKS Service exposing UDP traffic Repost on medium : https://medium.com/@allamand/using-aws-nlb-manually-targeting-an-eks-service-exposing-udp-traffic-17053ecd8f52 ## Problem encountered with EKS 1.16 If we try to create a service of type [Network Load Balancer](https://aws.amazon.com/elasticloadbalancing/) (NLB) for UDP traffic we can get this error : ``` Error creating load balancer (will retry): failed to ensure load balancer for service default/test: Only TCP LoadBalancer is supported for AWS ELB ``` This is because the UDP support for NLB is more [recent]([https://aws.amazon.com/blogs/aws/new-udp-load-balancing-for-network-load-balancer/](https://aws.amazon.com/blogs/aws/new-udp-load-balancing-for-network-load-balancer/) ) than the functionality developed inside kubernetes for creating NLB load balancers. The bug is being report in this issue : [#79523](https://github.com/kubernetes/kubernetes/issues/79523) and is currently investigated by AWS. We are going to work arround this actual limitation ## Using NodePort Kubernetes service Meanwhile we can manually configure an NLB to point to our EKS instances, and configure a Kubernetes **NodePort** service instead of **LoadBalancer**. In NodePort mode, every Instance will listen on a pre-defined port (range between 30000-32767) on each EC2 instances and Kubernetes will forward the traffic to the associate Kubernetes pods: ![](https://i.imgur.com/N85R3Of.png) ## Example: exposing kube-dns with NLB As an example, we are going to expose the Kubernetes core-dns pods through a manually created NLB. We choose core-dns, that is expose an UDP service on port 53. ### Creating the kubernetes NodePort service Those pods have label `k8s-app: kube-dns` so we can create a new service of type NodePort: ``` cat << EOF > service-kube-dns-nodeport.yaml apiVersion: v1 kind: Service metadata: name: service-kube-dns-nodeport namespace: kube-system labels: app.kubernetes.io/name: service-kube-dns-nodeport spec: type: NodePort ports: - name: dns-udp port: 53 targetPort: 53 nodePort: 30053 protocol: UDP - name: dns-tcp port: 53 targetPort: 53 nodePort: 30054 protocol: TCP selector: k8s-app: kube-dns EOF ``` Let's apply the service definition to the EKS cluster. ``` k apply -f service-kube-dns-nodeport.yaml ``` We have defined an UDP port 30053 that will be listening on each Instances, and forward to the kube-dns pods on port 53. We have also defined a TCP port 30054 that will be used by the NLB when configuring the healthcheck port. > This service must be deployed in **kube-system** namespace as the coredns-pods are deployed there. #### (Optional) Test internally the service works At this stage, we have created the Kubernetes service, which is already listening on each instances. Before creating the NLB, we can check that this is working inside of the Kubernetes cluster: We can uses a specific kubernetes pod [eksutils-pod](https://raw.githubusercontent.com/allamand/eksutils/master/eksutils-pod.yaml) that may be useful to debug things in kubernetes. Create the eksutils pod in the default namespace: ``` kubectl apply -f https://raw.githubusercontent.com/allamand/eksutils/master/eksutils-pod.yaml ``` Then we can connect inside the pod, and see if we are able to make dns resolution based on the internal services. In the below example, For each of the Kubernetes nodes I request the grafana service in the metrics namespace. Change this to point to existing services you have in your cluster. It also check the conn ection on the TCP port 30054 ``` $ kubectl exec -n default -ti eksutils-pod zsh eksutils@eksutils-pod $ for x in $(k get nodes -o wide | awk '{print $6}' | grep -v INTERNAL); do echo $x ; dig @$x -p 30053 grafana.metrics.svc.cluster.local ; telnet $x 30054 ; done ``` This will make a dns lookup for a grafana service in metrics namespace, on each nodes on port 30053 UDP, and for each port try to connect on port 30054 TCP. ### Create the Network Load Balancer Get the subnets of your instances in the same vpc you have for your EKS cluster ``` VPC_ID=vpc-027f50fc9d05149f0 aws ec2 describe-instances --filters Name=network-interface.vpc-id,Values=$VPC_ID \ --query 'Reservations[*].Instances[*].SubnetId' --output text | sort | uniq -c 5 subnet-0378859fcc9e53fa6 5 subnet-055b4b800624d3b99 ``` > Note: this is ok only if you have a dedicated vpc for your EKS cluster. If this is not the case, you need to filter to match only the instances you want. First create the NLB using the subnets of your worker nodes ``` NLB_NAME=kube-dns-nlb aws elbv2 create-load-balancer --name $NLB_NAME\ --type network \ --subnets subnet-0378859fcc9e53fa6 subnet-055b4b800624d3b99 ``` #### Create the target Group for your NLB ``` TG_NAME=kube-dns-tg aws elbv2 create-target-group --name $TG_NAME --protocol UDP --port 30053 --vpc-id $VPC_ID \ --health-check-protocol TCP \ --health-check-port 30054 \ --target-type instance ``` #### Register Instances in the target group We want to add every nodes which are parte of our EKS cluster as target for our NLB target-group. Get the list of instances ``` INSTANCES=$(kubectl get nodes -o json | jq -r ".items[].spec.providerID" | cut -d'/' -f5) IDS=$(for x in `echo $INSTANCES`; do echo Id=$x ; done | tr '\n' ' ') echo $IDS ``` Register the instances : ``` TG_ARN=$(aws elbv2 describe-target-groups --query 'TargetGroups[?TargetGroupName==`kube-dns-tg`].TargetGroupArn' --output text) aws elbv2 register-targets --target-group-arn $TG_ARN --targets $(echo $IDS) ``` #### Create a Listener for the target-group ``` LB_ARN=$(aws elbv2 describe-load-balancers --names $NLB_NAME --query 'LoadBalancers[0].LoadBalancerArn' --output text) echo $LB_ARN ```` ``` aws elbv2 create-listener --load-balancer-arn $LB_ARN \ --protocol UDP --port 53 \ --default-actions Type=forward,TargetGroupArn=$TG_ARN ``` Check listener ``` aws elbv2 describe-listeners --load-balancer-arn $LB_ARN ``` Check Health of the targets ``` aws elbv2 describe-target-health --target-group-arn $TG_ARN ``` > Should be uneahlthy until we configure the SG of the instances. ### Configure Instances Security Groups In order to allow the health check, we need to allow the port 30054 in the Security Groups of our instances to be reach by the IP of the NLB Get security group from instances IDs for all instances ``` SGs=$(for x in $(echo $INSTANCES); do aws ec2 describe-instances --filters Name=instance-id,Values=$x \ --query 'Reservations[*].Instances[*].SecurityGroups[0].GroupId' --output text ; done | sort | uniq) ``` Add Rule to the Security Group ``` for x in $(echo $SGs); do echo SG=$x; aws ec2 authorize-security-group-ingress --group-id $x --protocol tcp --port 30054 --cidr 192.168.0.0/16; aws ec2 authorize-security-group-ingress --group-id $x --protocol udp --port 30053 --cidr 0.0.0.0/0 ; done ``` > Instead of oppening from 192.168.0.0/16 you can open for the real NLB Ip adresses using > >``` >NLB_NAME_ID=$(aws elbv2 describe-load-balancers --names $NLB_NAME --query 'LoadBalancers[0].LoadBalancerArn' --output text | awk -F":loadbalancer/" '{print $2}') >aws ec2 describe-network-interfaces \ > --filters Name=description,Values="ELB $NLB_NAME_ID" \ > --query 'NetworkInterfaces[*].PrivateIpAddresses[*].PrivateIpAddress' --output text >``` ### Test it Once the targets are healthy, you can test the access. find the URL of your NLB and test it's access (be sure you have internet access for port 53) ``` LB_DNS=$(aws elbv2 describe-load-balancers --name $NLB_NAME --query 'LoadBalancers[0].DNSName' --output text echo $LB_DNS ``` Test our newly expose domain name server with dig: ``` $ dig @$LB_DNS grafana.metrics.svc.cluster.local ; <<>> DiG 9.10.6 <<>> @kube-dns-nlb-a688e8ddf1200136.elb.us-east-1.amazonaws.com grafana.metrics.svc.cluster.local ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; WARNING: .local is reserved for Multicast DNS ;; You are currently testing what happens when an mDNS query is leaked to DNS ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53980 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;grafana.metrics.svc.cluster.local. IN A ;; ANSWER SECTION: grafana.metrics.svc.cluster.local. 5 IN A 10.100.165.230 ;; Query time: 98 msec ;; SERVER: 54.164.194.190#53(54.164.194.190) ;; WHEN: Wed May 13 19:27:47 CEST 2020 ;; MSG SIZE rcvd: 111 ``` We can see here that we are able to request our exposed DNS server. ## Automate target creation in our load balancer ### Scaling Instances If you add some EKS instances, you'll need to add thoses instances in your NLB target group in order to be able to spread the load on thoses instances too. When Scaling down, the old instances will become unhealthy, and they'll automagically be deregistered and disapears from the target groups, so there will be no specific action to take in this case. We want to find a way to automate the addon on new targets in the NLB when we are doing a scale out. > We can configure Amazon EC2 Auto Scaling to send events to CloudWatch Events whenever our Auto Scaling group scales. ![](https://i.imgur.com/joIrTNw.png) The Idea here is that each time an instance is created by our auto scaling groups of our EKS cluster, then the instances are automatically added to the NLB target group. #### Create a Lambda function to automate adding instance in the NLB Create Lambda function which is going to add the instance added via the AutoScaling Group to the NLB Target Group ``` import json import boto3 from pprint import pprint LbName="kube-dns-nlb" #<- change accoring to your setup print('Loading function') elb = boto3.client('elbv2') def find_lb_arn(name): # describe load balancer name lbs_list_response = elb.describe_load_balancers(Names=[name]) if lbs_list_response['ResponseMetadata']['HTTPStatusCode'] == 200: print ("LBs list: " + ' '.join(p for p in [lb['LoadBalancerName'] for lb in lbs_list_response['LoadBalancers']])) #We have only 1 lb lbArn = lbs_list_response['LoadBalancers'][0]['LoadBalancerArn'] else: print ("Describe lbs failed") return lbArn def lambda_handler(event, context): print("AutoScalingEvent()") print("Debug Event data = " + json.dumps(event, indent=2)) target_id = event['detail']['EC2InstanceId'] print("We are going to add InstanceID = " + target_id) #Find load balancer arn lbArn = find_lb_arn(LbName) print ("lbArn="+lbArn) # Register targets targets_list = [dict(Id=target_id)] describe_tg_response = elb.describe_target_groups(LoadBalancerArn=lbArn) #pprint(describe_tg_response) tgId = describe_tg_response['TargetGroups'][0]['TargetGroupArn'] print ("tgID = " + tgId) #Register target in targetGroup reg_targets_response = elb.register_targets(TargetGroupArn=tgId, Targets=targets_list) if reg_targets_response['ResponseMetadata']['HTTPStatusCode'] == 200: print ("Successfully registered targets") else: print ("Register targets failed") ``` > You need to parameterize **LbName** to the name of the NLB you want to add the instance into Create the Role for our Lambda function ``` NAME=add-instance-to-nlb ACCOUNT_ID=$(aws sts get-caller-identity --output text --query 'Account') ASSUMEPOLICY=$(echo -n '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }') echo ACCOUNT_ID=$ACCOUNT_ID echo ASSUMEPOLICY=$ASSUMEPOLICY LAMBDA_ROLE_ARN=$(aws iam create-role \ --role-name $NAME \ --description "Role to allow Lambda function to manage NLB targets" \ --assume-role-policy-document "$ASSUMEPOLICY" \ --output text \ --query 'Role.Arn') echo $LAMBDA_ROLE_ARN ``` attach policy to the role ``` aws iam attach-role-policy \ --role-name $NAME \ --policy-arn arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess LAMBDA_POLICY=$(echo -n '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "logs:CreateLogGroup", "Resource": "arn:aws:logs:us-east-1:'; echo -n "$ACCOUNT_ID"; echo -n ':*" }, { "Effect": "Allow", "Action": [ "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:us-east-1:'; echo -n "$ACCOUNT_ID"; echo -n ':log-group:/aws/lambda/'; echo -n "$NAME"; echo -n ':*" ] } ] }') echo $LAMBDA_POLICY LAMBDA_POLICY_ARN=$(aws iam create-policy \ --policy-name AWSLambdaBasicExecutionRole-$NAME \ --policy-document "$LAMBDA_POLICY" \ --output text \ --query 'Policy.Arn') echo $LAMBDA_POLICY_ARN aws iam attach-role-policy \ --role-name $NAME \ --policy-arn $LAMBDA_POLICY_ARN ``` Create the Lambda function (I've created a zip with the previous lambda code) ``` aws lambda create-function \ --function-name $NAME \ --runtime python3.7 \ --zip-file fileb://~/environment/add-instance-to-nlb.zip \ --handler add-instance-to-nlb.lambda_handler \ --role $LAMBDA_ROLE_ARN ``` test the function with the json test ``` { "version": "0", "id": "12345678-1234-1234-1234-123456789012", "detail-type": "EC2 Instance Launch Successful", "source": "aws.autoscaling", "account": "123456789012", "time": "yyyy-mm-ddThh:mm:ssZ", "region": "us-west-2", "resources": [ "auto-scaling-group-arn", "instance-arn" ], "detail": { "StatusCode": "InProgress", "Description": "Launching a new EC2 instance: i-12345678", "AutoScalingGroupName": "my-auto-scaling-group", "ActivityId": "87654321-4321-4321-4321-210987654321", "Details": { "Availability Zone": "us-west-2b", "Subnet ID": "subnet-12345678" }, "RequestId": "12345678-1234-1234-1234-123456789012", "StatusMessage": "", "EndTime": "yyyy-mm-ddThh:mm:ssZ", "EC2InstanceId": "i-1234567890abcdef0", "StartTime": "yyyy-mm-ddThh:mm:ssZ", "Cause": "description-text" } } ``` > You should have a normal error saying that: > ``` > The following targets are not valid instances: 'i-1234567890abcdef0'" > ``` #### Create CloudWatch Rule Adapt the following Event pattern rule with the name of the AutoScaling Groups you need. In my case, I have 3 auto scaling group associated to my eks cluster. ``` EVENT_PATTERN=$(echo -n '{ "source": [ "aws.autoscaling" ], "detail-type": [ "EC2 Instance Launch Successful" ], "detail": { "AutoScalingGroupName": [ "eksctl-eksworkshop-eksctl-nodegroup-ng-spot-NodeGroup-1MCQMJAIUZCSS", "eks-f8b8de05-e964-8e64-5043-60449f530a2b", "eks-48b909fd-3aa4-c200-dcc0-cb8c5b637736" ] } }') echo $EVENT_PATTERN ``` Create the CloudWatch rule ``` aws events put-rule \ --name $NAME \ --event-pattern "$EVENT_PATTERN" ``` Get Lambda arn ``` LAMBDA_ARN=$(aws lambda get-function --function-name $NAME --query 'Configuration.FunctionArn' --output text) echo $LAMBDA_ARN ``` Create Cloudwatch Event rule target ``` RULE_TARGET=$(echo -n '[ { "Id": "1", "Arn": "'; echo -n "$LAMBDA_ARN"; echo -n '" } ]') echo $RULE_TARGET ``` Add the target to the event rule ``` aws events put-targets \ --rule $NAME \ --targets "$RULE_TARGET" ``` Add permission on the Lambda to be triggered by the Event ``` aws lambda add-permission \ --function-name $NAME \ --statement-id autoscaling-event-rule \ --action 'lambda:InvokeFunction' \ --principal events.amazonaws.com \ --source-arn $(aws events describe-rule --name $NAME --query 'Arn' --output text) ``` <details> <summary>Or Using the Console:</summary> We Create a CloudWatch Rule in order to associate our AutoScaling Group with our Lambda Function. Select Auto Scaling as event Patter, and select all your autoscaling groups you want to monitor ![](https://i.imgur.com/nnkorh7.png) For the target, simply choose the Lambda function we just create. ![](https://i.imgur.com/ZEOR1CI.png) </details> </p> ### Testing You can now test Scaling out or scaling in your ASG, or simply terminate some instances, the ASG will send an event to the lambda each time a new instance is created in the acording ASGs, and the lambda register the new instance in the NLB target groups. <details> <summary>Check this if you need to manually add instances with the cli</summary> ### Manually add instances Here I will just show you how you can add instances to the NLB manually using the CLI. Normally you won't need to do this instead perhaps the first time, if you created the instances before our previous automatic setup. Retrieve the list of instances in your VPC you want to add to the Load balancer.. ``` VPC_ID=vpc-027f50fc9d05149f0 INSTANCES=$(aws ec2 describe-instances --filters Name=network-interface.vpc-id,Values=$VPC_ID \ --query 'Reservations[*].Instances[*].InstanceId' \ --output text) IDS=$(for x in `echo $INSTANCES`; do echo Id=$x ; done | tr '\n' ' ') ``` ..Or we can get instances from the Kubernetes API : ``` INSTANCES=$(kubectl get nodes -o json | jq -r ".items[].spec.providerID" | cut -d'/' -f5) IDS=$(for x in `echo $INSTANCES`; do echo Id=$x ; done | tr '\n' ' ') echo $IDS ``` Manually add NLB targets ``` aws elbv2 register-targets \ --target-group-arn $(aws elbv2 describe-target-groups --query 'TargetGroups[?TargetGroupName==`kube-dns-nlb`].TargetGroupArn' --output text) \ --targets $(echo $IDS) ``` This will add every instance in from the $INSTANCES var as target group for the NLB **kube-dns-nlb** Because the service is in NodePort mode, then every instance is able to listen on this port and forward it to the targeted service, in this case the kube-dns service targeting itself the coredns pods. </details> </p> ## See what operations can have impact on our targets configuration ## Upgrading version for NodeGroups When upgrading the version of a nodegroup, EKS will start a rolling-upgrade of each instance. this will cause each old instance to disapear from the NLB when it is deleted, and new ones will automatically be added by our Cloudwatch Event + Lambda function ## Cleanup Deleting the target Group ``` aws elbv2 delete-target-group --target-group-arn $TG_ARN ``` Deleting the Load Balancer ``` aws elbv2 delete-load-balancer --load-balancer-arn $LB_ARN ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.