ECS Demo - Task Placement Constraints

--- tags: ECS, Containers, AWS, Workshop --- # ECS Workshop - Task Placement Constraints ![alt text](https://raw.githubusercontent.com/awslabs/aws-icons-for-plantuml/main/source/unofficial/AWS-Architecture-Icons_SVG_20200430/SVG%20Light/_Group%20Icons/AWS-Cloud-alt_light-bg.svg "Tech Series" =20x20) In this AWS workshop we will deploy a VPC Network and ECS Cluster with capacity providers to leverage Task Placement Constraints to deploy ARM or GPU based definitions to the required container instance type. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html ## Clone the repository ```bash= cd ~/environment git clone https://github.com/aws-containers/ecsworkshop-advanced-scheduling-chapter.git cd ecsworkshop-advanced-scheduling-chapter ``` We are defining our deployment configuration via code using AWS Cloudformation. Let’s look through the code to better understand what resources CloudFormation will create. ## Deploying Networking CFN Stack To start, we will deploy standard networking resources (VPC, Public and Private Subnets) using the following AWS CloudFormation (CFN) template, naming the stack as ecsworkshop-vpc ```bash= aws cloudformation create-stack --stack-name=ecsworkshop-vpc --template-body file://ecsworkshop-vpc.yaml ``` Once the above CFN stack is ready (reached CREATE_COMPLETE state), the stack will export values namely VPCId, SecurityGroup , Public & Private SubnetIds. We will need these values when creating the EC2 instances. Run the following aws cli command which calls the DescribeStack API, and verifies the creation of networking resources ```bah= aws cloudformation describe-stacks --stack-name ecsworkshop-vpc --query 'Stacks[*].Outputs' --output table ``` ## Deploying the Cluster Resources Next we will create the ECS Cluster infrastructure, which we’ll dive into in more detail below. In this stack deployment, we are importing the VPC, Security Group, and Subnets from the VPC base platform stack that we deployed above. We first need to update the CFN template to latest schema standards, open ecsworkshop-demo.yaml for editing in your IDE. (Cloud9) On line 206 ``` ArmECSAutoScalingGroup: "GpuECSAutoScalingGroup" ``` update to ``` AutoScalingGroupName: "GpuECSAutoScalingGroup" ``` capacity provider needs a default strategy near line 250 ```yaml= #Associate ECS Cluster Capacity Provider with both the ARM and CPU capacity provider. ClusterCPAssociation: Type: "AWS::ECS::ClusterCapacityProviderAssociations" Properties: Cluster: !Ref ECSCluster CapacityProviders: - !Ref ArmECSCapacityProvider - !Ref GpuECSCapacityProvider ``` update to include default strategy ```yaml= #Associate ECS Cluster Capacity Provider with both the ARM and CPU capacity provider. ClusterCPAssociation: Type: "AWS::ECS::ClusterCapacityProviderAssociations" Properties: Cluster: !Ref ECSCluster CapacityProviders: - !Ref ArmECSCapacityProvider - !Ref GpuECSCapacityProvider DefaultCapacityProviderStrategy: - Base: 1 Weight: 1 CapacityProvider: !Ref ArmECSCapacityProvider ``` Save the file in Cloud9 and now we can run the CFN template ```bash= aws cloudformation create-stack --stack-name=ecs-demo --template-body file://ecsworkshop-demo.yaml --capabilities CAPABILITY_NAMED_IAM ``` Review the resulting outputes ```bash= aws cloudformation describe-stacks --stack-name ecs-demo --query 'Stacks[*].Outputs' --output table ``` # Review the CFN Code The goal of this workshop is to schedule tasks onto EC2 instances that match their corresponding requirements. The two use cases we’re working with are to deploy ARM based containers as well as containers that require GPU’s. Let’s start with reviewing the ARM infrastructure. ## ARM Cluster Capacity and Task Definition To deploy ARM based EC2 instances to our cluster, we need to create some resources that will get our ARM based EC2 instances up and running and connected to the cluster. We start with creating our launch configuration, which is where we specify the AMI ID, security group and IAM role details, and finally the user data which runs the code we defined inline. The code in the user data will register the EC2 instance to the cluster. ```bash= echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config ``` ```yamml= # ARM64 based Launch Configuration. ArmASGLaunchConfiguration: Type: AWS::AutoScaling::LaunchConfiguration Properties: ImageId: !Ref ArmLatestAmiId SecurityGroups: !Split - ',' - Fn::ImportValue: !Sub "${VPCStackParameter}-SecurityGroups" InstanceType: !Ref 'ArmInstanceType' IamInstanceProfile: !Ref 'EC2InstanceProfile' UserData: Fn::Base64: !Sub | #!/bin/bash -xe echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config yum install -y aws-cfn-bootstrap /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource ArmECSAutoScalingGroup --region ${AWS::Region} ``` Next, we will create an Autoscaling group which contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management. An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies. ```yaml= # AutoScalingGroup to launch Container Instances using ARM64 Launch Configuration. ArmECSAutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: NewInstancesProtectedFromScaleIn: true VPCZoneIdentifier: !Split - ',' - Fn::ImportValue: !Sub "${VPCStackParameter}-PrivateSubnetIds" LaunchConfigurationName: !Ref 'ArmASGLaunchConfiguration' MinSize: '0' MaxSize: !Ref 'MaxSize' DesiredCapacity: !Ref 'DesiredCapacity' Tags: - Key: Name Value: !Sub 'ARM64-${ECSCluster}' PropagateAtLaunch: true CreationPolicy: ResourceSignal: Timeout: PT15M UpdatePolicy: AutoScalingReplacingUpdate: WillReplace: true ``` For scaling and management of the Autoscaling group, we create a capacity provider with cluster autoscaling enabled. This will ensure that as tasks are launched, EC2 instances come up and down as needed. We associate the capacity provider with the Autoscaling group we created above, which in turn will be controlled by ECS as scaling is needed. ```yaml= ArmECSCapacityProvider: Type: AWS::ECS::CapacityProvider Properties: AutoScalingGroupProvider: AutoScalingGroupArn: !Select [0, !GetAtt ARMCustomResource.Data ] ManagedScaling: MaximumScalingStepSize: 10 MinimumScalingStepSize: 1 Status: ENABLED TargetCapacity: 100 ManagedTerminationProtection: ENABLED ``` A task placement constraint is a rule that is considered during task placement. Out of the 2 supported types of task placement constraints, we will be using `memberOf`. From the available expressions, we will be using `ecs.cpu-architecture` to place the task(s) on the desired CPU Architecture of the Container Instance. Below is an example for placing task(s) on ARM64 Architecture. Finally we deploy our task definition, which instructs Amazon ECS as to how we want to launch our containers. In this task definition we define our container image, cpu/memory requirements, logging configuration, as well as the placement constraints. The placement constraints directive is where we have more control over where the tasks land when launched. With ECS there are two types of constraints that can be used: 1. **distinctInstance** - Place each task on a different container instance. This task placement constraint can be specified when either running a task or creating a new service. 2. **memberOf** - Place tasks on container instances that satisfy an expression. For more information about the expression syntax for constraints, see ([Cluster query language](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cluster-query-language.html)). In the task definition below we are using the memberOf constraint with an expression querying the default ([attribute](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html#attributes)) of cpu-architecture. Using the Cluster query language, we are instructing ECS to schedule these tasks only onto EC2 instances that are arm64 architecture. ```yaml= # ECS Task Definition for ARM64 Instance type. PlacementConstraints properties are setting the desired cpu-architecture to arm64. Arm64taskdefinition: Type: AWS::ECS::TaskDefinition Properties: Family: !Join ['', [!Ref 'AWS::StackName', -arm64]] PlacementConstraints: - Type: memberOf Expression: 'attribute:ecs.cpu-architecture == arm64' ContainerDefinitions: - Name: simple-arm64-app Cpu: 10 Command: - sh - '-c' - 'uname -a' Essential: true Image: public.ecr.aws/amazonlinux/amazonlinux:latest Memory: 200 LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref 'CloudwatchLogsGroup' awslogs-region: !Ref 'AWS::Region' awslogs-stream-prefix: ecs-arm64-demo-app ``` ## GPU Cluster Capacity and Task Definition Similar to how we defined our resources above, we do the same for GPU enabled instances. The main difference here is the AMI used. ```yaml= GpuASGLaunchConfiguration: Type: AWS::AutoScaling::LaunchConfiguration Properties: ImageId: !Ref GPULatestAmiId SecurityGroups: !Split - ',' - Fn::ImportValue: !Sub "${VPCStackParameter}-SecurityGroups" InstanceType: !Ref 'GpuInstanceType' IamInstanceProfile: !Ref 'EC2InstanceProfile' UserData: Fn::Base64: !Sub | #!/bin/bash -xe echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config yum install -y aws-cfn-bootstrap /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --resource GpuECSAutoScalingGroup --region ${AWS::Region} GpuECSAutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: NewInstancesProtectedFromScaleIn: true VPCZoneIdentifier: !Split - ',' - Fn::ImportValue: !Sub "${VPCStackParameter}-PrivateSubnetIds" LaunchConfigurationName: !Ref 'GpuASGLaunchConfiguration' MinSize: '0' MaxSize: !Ref 'MaxSize' DesiredCapacity: !Ref 'DesiredCapacity' Tags: - Key: Name Value: !Sub 'GPU-${ECSCluster}' PropagateAtLaunch: true CreationPolicy: ResourceSignal: Timeout: PT15M UpdatePolicy: AutoScalingReplacingUpdate: WillReplace: true GpuECSCapacityProvider: Type: AWS::ECS::CapacityProvider Properties: AutoScalingGroupProvider: AutoScalingGroupArn: !Select [1, !GetAtt ARMCustomResource.Data ] ManagedScaling: MaximumScalingStepSize: 10 MinimumScalingStepSize: 1 Status: ENABLED TargetCapacity: 100 ManagedTerminationProtection: ENABLED ``` As explained above in ARM64 Task definition, we will be using `ecs.instance-type` attribute to place the task(s) on the desired InstanceType of the Container Instance. Below is an example for placing task(s) on GPU Architecture. Note: For deploying tasks which requires GPU support, we will be using `instance-type` attribute because GPU base family type is supported in specific instance types only. We can opt the desired instance type from the list of declared/supported instance types while creating the AWS CloudFormation stack. In the below task definition we define our container image that requires GPU to run. Like the ARM task, we use the `memberOf` placement constraint but the query is a little different. Here we are placing these tasks based off of instance-type, as our tasks can not run without GPU’s available. We could also create a custom attribute and query off of that just in case we wanted to use multiple instance types. Also you may notice that the container is requesting a GPU under the `ResourceRequirements` key. This will ensure that a GPU is assigned to the task when launched. ```yaml= # ECS Task Definition for GPU Instance type. PlacementConstraints properties are setting the desired cpu-architecture to gpu. Gputaskdefinition: Type: AWS::ECS::TaskDefinition Properties: Family: !Join ['', [!Ref 'AWS::StackName', -gpu]] PlacementConstraints: - Type: memberOf Expression: !Sub 'attribute:ecs.instance-type == ${GpuInstanceType}' ContainerDefinitions: - Name: simple-gpu-app Cpu: 100 Essential: true Image: nvidia/cuda:11.0-base Memory: 80 ResourceRequirements: - Type: GPU Value: '1' Command: - sh - '-c' - nvidia-smi LogConfiguration: LogDriver: awslogs Options: awslogs-group: !Ref 'CloudwatchLogsGroup' awslogs-region: !Ref 'AWS::Region' awslogs-stream-prefix: ecs-gpu-demo-app ``` # Confirm resources are deployed Run the following to get the output which shows the cluster resources in your account: ```bash= aws cloudformation describe-stacks --stack-name ecs-demo --query 'Stacks[*].Outputs' --output table ``` At this point we have deployed the base platform and are ready to run some containers. Let’s move on to deploying ECS tasks using desired task definitions in the shared ECS cluster. # Task Placement Validation To run a ECS task on an ARM based EC2 instance, we need to provide three input parameters to the `RunTask` cli option. The `RunTask` command need ECS cluster, Task definition and CapacityProvider strategy. We generate the `ARM_TASKDEF` shell environment using the CloudFormation Output value. When we create the ARM based task definition using CloudFormation, we configured the task placement constraints. According to our placement constraint configuration, ECS scheduler will take the container instance CPU architecture and task will be placed only if CPU architecture is arm64 or else task will not be placed on Container instances with the ECS cluster. We can confirm the placement constraint configuration by describing the ARM based task definition and query the output for `taskDefinition.placementConstraints` value. This command also confirms that our `ARM_TASKDEF` value is set correctly. ```bash= ARM_TASKDEF=$(aws cloudformation describe-stacks --stack-name ecs-demo \ --query 'Stacks[*].Outputs[?OutputKey==`Armtaskdef`]' --output text | awk '{print $NF}') aws ecs describe-task-definition --task-definition $ARM_TASKDEF \ --query 'taskDefinition.placementConstraints' ``` Likewise, configured `GPU_TASKDEF` environment variable by querying the CloudFormation stack output for GPU task definition resource name. We also created GPU base task definition and while creating this task definition, we configured the placement configuration to look for an `instance-type` equals to `p2.xlarge`. We confirm this by executing the describe task definition command against `GPU_TASKDEF`. ```bash= GPU_TASKDEF=$(aws cloudformation describe-stacks --stack-name ecs-demo \ --query 'Stacks[*].Outputs[?OutputKey==`Gputaskdef`]' --output text | awk '{print $NF}') echo $GPU_TASKDEF aws ecs describe-task-definition --task-definition $GPU_TASKDEF \ --query 'taskDefinition.placementConstraints' ``` Based on these constraints, when a user tries to launch an ECS Task (CreateService/RunTask), the ECS scheduler will look for the constraints field in the task definition. Based on the constraints, the task will be placed on the container instance within the ECS Cluster that matches the constraint. If none of the available Container Instance(s) fulfill the requirement, the ECS scheduler will not be able to place the task. # Validation under success scenario This command also confirms that our ECS_CLUSTER value is set correctly. ```bash= ECS_CLUSTER=$(aws cloudformation describe-stacks --stack-name ecs-demo \ --query 'Stacks[*].Outputs[?OutputKey==`ecscluster`]' --output text | awk '{print $NF}') echo $ECS_CLUSTER ``` Now it’s time to get our tasks deployed! Run the following commands to get tasks deployed to our ECS cluster. ```bash= ARM_ECSCP=$(aws cloudformation describe-stacks --stack-name ecs-demo \ --query 'Stacks[*].Outputs[?OutputKey==`ArmECSCapacityProvider`]' --output text | awk '{print $NF}') echo $ARM_ECSCP aws ecs run-task --cluster $ECS_CLUSTER --task-definition $ARM_TASKDEF --capacity-provider-strategy "capacityProvider=${ARM_ECSCP}" ``` Likewise, a GPU based task deployed. ```bash= GPU_ECSCP=$(aws cloudformation describe-stacks --stack-name ecs-demo \ --query 'Stacks[*].Outputs[?OutputKey==`GpuECSCapacityProvider`]' --output text | awk '{print $NF}') echo $GPU_ECSCP aws ecs run-task --cluster $ECS_CLUSTER --task-definition $GPU_TASKDEF --capacity-provider-strategy "capacityProvider=${GPU_ECSCP}" ``` You will get a JSON response indicating that a Task was submitted. Check for failure. If it is empty it means the task was submitted successfully. Also look for task lastStatus filed in JSON output and is in PROVISIONING stage. This will take a couple of minutes for cluster autoscaling to kick in. We start with zero instances in our cluster and let ECS handle the scaling of the EC2 instances based on demand. Navigate to the ECS console and select the `ecs-demo-ECSCluster-RANDOM`, you will see that the Capacity Provider `ArmECSCapacityProvider` and `GpuECSCapacityProvider` current size increased to 1 for each. ![](https://hackmd.io/_uploads/rJNrCM9A2.png) A task will be register in a pending state while infrastructure is provisioned to meet the capacity requirements ![](https://hackmd.io/_uploads/ryT-AzcCh.png) In the ECS Container Instance tab, you will see that now you have two EC2 instances (1 for each task). On ECS Instance tab, click on settings icon and select ecs.cpu-arhitecture and ecs-instance-type. Based on this value, you will see that the Capacity provider launched both GPU (instance-type=p2.xlarge) and ARM (cpu-architecture=arm64) instance types as required for the respective task definition constraints. ![](https://hackmd.io/_uploads/HkvPCG903.png) Click the container instance to view its settings in ECS ![](https://hackmd.io/_uploads/B1HARMqC3.png) Click through on instance ID to view the EC2 settings to see that it has provisioned a t4g.micro (ARM) based instance type. ![](https://hackmd.io/_uploads/HyRbJm90n.png) That’s it, we have successfully deployed two tasks onto our cluster based on specific placement constraints! In the ECS console, navigate to tasks, check the boxes for the running tasks and select Stop. After approximately 15 minutes, the cluster autoscaler will kill the EC2 instances as there are no tasks requiring them. Let’s move on to the cleanup step to delete all of the resources. # Delete Resources Delete the CloudFormation stack created for this workshop ```bash= aws autoscaling update-auto-scaling-group --auto-scaling-group-name GpuECSAutoScalingGroup --no-new-instances-protected-from-scale-in aws autoscaling update-auto-scaling-group --auto-scaling-group-name ArmECSAutoScalingGroup --no-new-instances-protected-from-scale-in aws cloudformation delete-stack --stack-name ecs-demo aws cloudformation delete-stack --stack-name ecsworkshop-vpc ```