Trustsoft K8s 1-2.12.23

Trustsoft K8s 1-2.12.23 === # Local setup * https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/ * https://github.com/bitnami/charts/tree/main/bitnami/wordpress/templates * https://bitnami.com/stack/wordpress/helm ## Minikube Start with `minikube`: * https://docs.google.com/presentation/d/1bX3UGMQRuBjkZW89urH-xvz0rOBXbD8q3rmwYW8JQZw/edit?usp=sharing Now deploy wordpress ``` helm install wp oci://registry-1.docker.io/bitnamicharts/wordpress helm list ``` Check documentation of values at [chart](https://github.com/bitnami/charts/tree/main/bitnami/wordpress/#installing-the-chart) Check ingress setup. ``` minikube addons enable ingress ``` Update with values: ``` helm upgrade --install wp oci://registry-1.docker.io/bitnamicharts/wordpress -f wp/values.yml ``` Check wp: ``` kubectl get secret --namespace default wp-wordpress -o jsonpath="{.data.wordpress-password}" | base64 -d ``` Monitoring *: ``` helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm install -f prometheus/only-prometheus.yml prometheus prometheus-community/prometheus ``` * we'll probably use the prometheus from the last session. Check prometheus server: ``` export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=prometheus,app.kubernetes.io/instance=prometheus" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace default port-forward $POD_NAME 9090 ``` pull/push metrics model: * traces: push/pull? * metrics: push/pull? * IoT: push/pull? ``` minikube addons configure registry-creds minikube service wp-wordpress --url ``` Benchmark: ``` export WP_HOSTNAME=`minikube service wp-wordpress --url | head -n 1` k6 run k6/index.js ``` Disable caching at values and check again. Inconclusive, we'll see in the cloud. Volumes! * Add picture * Kill the Pod * Check the wp again * Resize the claim ``` only dynamically provisioned pvc can be resized and the storageclass that provisions the pvc must support resize ``` check `kubectl edit storageclass standard` ``` 63s Warning ExternalExpanding persistentvolumeclaim/wp-wordpress waiting for an external controller to expand this PVC ``` Could not be done as simple as needed. # Deploy to EKS ``` terraform apply -target=module.vpc terraform apply -target=module.vpc_cni_irsa terraform apply -target=module.eks ``` `-target` is not a good practise: ``` Warning: Applied changes may be incomplete The plan was created with the -target option in effect, so some changes requested in the configuration may have been ignored and the output values may not be fully updated. Run the following command to verify that no other changes are pending: terraform plan ``` checkout cluster ``` aws eks update-kubeconfig --region eu-west-1 --name beranm-testing-01 --profile trustsoft ``` deploy wp ``` helm upgrade --install wp oci://registry-1.docker.io/bitnamicharts/wordpress -f wp/values.yml ``` check pending pods ``` kubectl get pods --field-selector status.phase=Pending ``` see ``` running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition ``` add ``` module "ebs_csi_irsa_role" { source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks" role_name = "${local.name}-ebs-csi" attach_ebs_csi_policy = true oidc_providers = { ex = { provider_arn = module.eks.oidc_provider_arn namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"] } } } ... aws-ebs-csi-driver = { service_account_role_arn = module.ebs_csi_irsa_role.iam_role_arn most_recent = true } ``` run ``` terraform apply -target=module.vpc -target=module.vpc_cni_irsa -target=module.eks -target=module.ebs_csi_irsa_role ``` Check https://eu-west-1.console.aws.amazon.com/ec2/home?region=eu-west-1#Volumes: And now resize the claim Allow resize at storage class: ``` allowVolumeExpansion: true ``` How about scaling the deployment? Will it work? Check error on HPA: ``` the HPA was unable to compute the replica count: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io) ``` Try to upload the picture again? Let's scale for multiple nodes and change scaling to prefer different nodes ``` 0/2 nodes are available: 2 node(s) had volume node affinity conflict. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.. ``` See https://stackoverflow.com/questions/51946393/kubernetes-pod-warning-1-nodes-had-volume-node-affinity-conflict Check https://eu-west-1.console.aws.amazon.com/ec2/home?region=eu-west-1#Volumes: Optional storage class: ``` apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: gp3 provisioner: kubernetes.io/aws-ebs parameters: fsType: ext4 type: gp3 reclaimPolicy: Retain volumeBindingMode: WaitForFirstConsumer allowedTopologies: - matchLabelExpressions: - key: failure-domain.beta.kubernetes.io/zone values: - eu-west-1a - eu-west-1b - eu-west-1c ``` Works on instance termination, not for drains and cordons. Let's try. 10 minutes not a problem. Check https://pet2cattle.com/2022/04/volume-affinity-conflict PVC has zones. Change az setup of default node group: ``` eks_managed_node_groups = { default_node_group = { use_custom_launch_template = false disk_size = 50 desired_size = 2 min_size = 2 subnet_ids = [module.vpc.private_subnets[0]] } } ``` Limit to single az and kill node again. How about multi-az scaling? ## Enable EFS and try again ``` module "efs_csi_irsa_role" { source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks" role_name = "${local.name}-efs-csi" attach_efs_csi_policy = true oidc_providers = { ex = { provider_arn = module.eks.oidc_provider_arn namespace_service_accounts = ["kube-system:efs-csi-controller-sa"] } } } ``` and add to EKS module: ``` ... eks_managed_node_group_defaults = { ami_type = "AL2_x86_64" instance_types = ["t3.medium"] capacity_type = "SPOT" iam_role_attach_cni_policy = true vpc_security_group_ids = [aws_security_group.eks.id] } ... aws-efs-csi-driver = { service_account_role_arn = module.efs_csi_irsa_role.iam_role_arn most_recent = true } ``` and enable `efs.tf`. Deploy new storage class ``` kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: efs provisioner: efs.csi.aws.com parameters: provisioningMode: efs-ap fileSystemId: fs-0bc88e6ad43d533f1 directoryPerms: "700" ``` Change class in values. Let's * scale * benchmark * enable caching, ... ## QoS classes Pod priorities for scheduling * check [slides](https://docs.google.com/presentation/d/1T3c2_C5H6hSR6k4CPN-X67Nx_T6hKhYQh-QKLB1_ntY/edit#slide=id.g1c8c3191096_0_440) ``` helm upgrade --create-namespace -n wp-guaranteed --install wp-guaranteed oci://registry-1.docker.io/bitnamicharts/wordpress -f wp/guaranteed-values.yml ``` ``` helm upgrade --create-namespace -n wp-best-efford --install wp-best-efford oci://registry-1.docker.io/bitnamicharts/wordpress -f wp/best-efford-values.yml ``` And now K6 both of them. Do not use priority class unless it's enritely needed. ## Karpenter [Example](https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/karpenter/main.tf) ## KEDA [Scalers](https://keda.sh/docs/2.12/scalers/)