ECS - Incident Response and Forensics

--- tags: ECS, Containers, AWS, Blog --- # ECS - Incident Response and Forensics When security incidents occur, your ability to react swiftly to an incident is critical for minimizing the damage caused. In this post we will walk through an example to showcase how to isolate a workload, so we can perform forensic investigation and root cause analysis, specifically for Amazon ECS workloads. Establishing a reliable alerting system that can promptly notify you of suspicious activities ([Amazon GuardDuty Runtime Monitoring for ECS](https://)) is the foundational step in crafting an effective incident response plan. In the event of an incident, you'll need to make rapid decisions regarding whether to terminate and replace the affected container or isolate and examine it. If you opt to isolate the container as part of a forensic investigation and root cause analysis, follow the set of activities below. # Example Incident Response Plan ## Reduce the Blast Radius By isolating the impacted ECS Container Instance or Fargate task, you're instructing the ECS scheduler to refrain from scheduling a task or tasks onto an affected instance. This isolation allows you to take the compremised artifacts offline for forensic analysis without causing disruptions to other running tasks in the cluster. 1. **Identify the Compromised ECS Container Instance or Fargate task:** Identify the specific ECS Container Instance or Fargate task that you suspect has been compromised. 2. **Update Container Instance Metadata:** Add custom metadata tags to the compromised ECS Container Instance to mark it for forensic analysis. For example, you can add a tag like "forensic=true" to the instance. 3. **Adjust the ECS Service or Task Definition:** Update the ECS service or task definition to include placement constraints that exclude the ECS Container Instance with the "forensic=true" tag. This way, the scheduler will avoid scheduling new tasks on the compromised instance. 4. **Isolate the Task with a Security Group Rule:** Create a security group rule to deny all inbound and outbound traffic to the compromised task. 5. **Revoke temporary security credentials:** Remove IAM roles from the instance or task to prevent further damage from the attack. ***<Add another step here too like revoking credentials from the role assumption? (i.e. is simply removing the role enough to kill active temp credentials?>*** 7. **Update Auto Scaling Group (if applicable):** If you're using an Auto Scaling Group to manage your ECS Container Instances, you can adjust the Auto Scaling Group settings to ensure that the compromised instance is not replaced automatically. 8. **Remove the ECS Container Instance:** After the ECS service or task definition has been adjusted, you can safely remove the compromised ECS Container Instance from your cluster without disrupting other workloads. 9. **Forensic Analysis:** Once the compromised ECS Container Instance has been isolated, you can proceed with your forensic analysis without affecting other workloads in your ECS cluster. ***<TBC, how do you actually connect to the compromised instance if we've isolated it... (SSM connect?) -- Is this described in the "Collect Forensic Artifacts" section?>*** Keep in mind that the specific steps may vary depending on your ECS setup, and you should adapt these instructions to your environment. Isolating an ECS Container Instance is an important security measure to prevent potential threats from spreading, and it allows you to conduct a thorough forensic investigation. Here are the tasks in more detail. # Identify the Compromised Task and EC2 Instance Your initial course of action should be to pinpoint the affected task. Begin by identifying where the breach occurred and isolate the specific task and its associated EC2 instance from the rest of your infrastructure. ## Identify Compromised Tasks and EC2 Instances Using Task Definition Name If you have the name of the compromised task and its task definition, you can identify the EC2 instance running the task as follows: ``` bash= # Replace <task-name> with the actual task name aws ecs list-tasks --cluster <your-cluster-name> --desired-status RUNNING --family <task-name> --query 'taskArns' --output text ``` ## Identify Compromised Tasks and EC2 Instances Using IAM Role or Image Vulnerabilities: In some cases, you may find that an IAM role or a container image is compromised. Tasks using these roles or images may also be compromised. Identify the tasks and their associated EC2 instances using the following commands: For IAM Roles: ```bash= # Replace <IAM-role-name> with the actual IAM role name aws ecs list-tasks --cluster <your-cluster-name> --desired-status RUNNING --query "taskArns[?taskArn == \`$(aws ecs list-tasks --cluster <your-cluster-name> --desired-status RUNNING --query \"taskArns[]\" --output text | jq '.[]')\` && contains(executionRoleArn, '<IAM-role-name>')]" --output text ``` For Container Image Vulnerabilities ```bash= # Replace <image-name> with the actual image name aws ecs list-tasks --cluster <your-cluster-name> --desired-status RUNNING --query "taskArns[?taskArn == \`$(aws ecs list-tasks --cluster <your-cluster-name> --desired-status RUNNING --query \"taskArns[]\" --output text | jq '.[]')\` && contains(containers[].image, '<image-name>')]" --output text ``` ## Tag for Ongoing Investigation Mark the compromised task or EC2 instance with a tag indicating that it's under an active investigation. This serves as a warning to administrators not to interfere until the investigation is complete. To tag an Amazon ECS instance for forensic purposes after enabling termination protection, you can use the `create-tags` AWS CLI command. Here's how you can add a "forensic=true" tag to the ECS instance: ```bash= # Replace 'YourInstanceId' with the ID of your ECS instance instance_id="YourInstanceId" # Add a "forensic=true" tag to the EC2 instance aws ec2 create-tags --resources $instance_id --tags Key=forensic,Value=true ``` In this script: 1. Replace `YourInstanceId` with the ID of the ECS instance that you want to tag for forensic purposes. 2. We use the `create-tags` command to add the "forensic=true" tag to the EC2 instance. This tag helps identify that the instance is under forensic investigation. After running this script, the ECS instance will be protected from termination, and it will have the "forensic=true" tag associated with it, indicating that it's part of a forensic investigation. ## Isolate the Task with a Security Group Rule Create a security group rule to deny all inbound and outbound traffic to the compromised task. This can help halt any ongoing attacks by cutting off network access to the task. To create an AWS security group that denies all inbound and outbound traffic (effectively blocking all traffic), you can use the AWS CLI with the `create-security-group` and `authorize-security-group-ingress` and `authorize-security-group-egress` commands. Firstly Tag the existing Secuirty Group with a key pair you can use to report on it for foresics later. ```bash= #Replace 'YourSecurityGroupName' with the name of the security group you want to tag security_group_name="YourSecurityGroupName" #Find the security group ID by name security_group_id=$(aws ec2 describe-security-groups —filters "Name=group-name,Values=$security_group_name" —query "SecurityGroups[0].GroupId" —output text) #Tag the security group with "forensic=true" aws ec2 create-tags —resources $security_group_id —tags Key=forensic,Value=true ``` Assign the new security group to block all traffic. ```bash= #Create the security group aws ec2 create-security-group --group-name DenyAllSG --description "Security Group that denies all traffic" #Get the ID of the newly created security group security_group_id=$(aws ec2 describe-security-groups --group-names DenyAllSG --query 'SecurityGroups[0].GroupId' --output text) #Authorize ingress rules (Deny All) aws ec2 authorize-security-group-ingress --group-id $security_group_id --protocol -1 --source-group $security_group_id #Authorize egress rules (Deny All) aws ec2 authorize-security-group-egress --group-id $security_group_id --protocol -1 --destination-group $security_group_id #Create the security group aws ec2 create-security-group --group-name DenyAllSG --description "Security Group that denies all traffic" #Get the ID of the newly created security group security_group_id=$(aws ec2 describe-security-groups --group-names DenyAllSG --query 'SecurityGroups[0].GroupId' --output text) #Authorize ingress rules (Deny All) aws ec2 authorize-security-group-ingress --group-id $security_group_id --protocol -1 --source-group $security_group_id #Authorize egress rules (Deny All) aws ec2 authorize-security-group-egress --group-id $security_group_id --protocol -1 --destination-group $security_group_id ``` :::warning :warning: A Security Group on the task may prove ineffective if an attacker has gained access to underlying ECS host. If you suspect that has happened, you can use AWS Security Groups to isolate a compromised host from other hosts. When changing a host's security group, be aware that it will impact all containers running on that host. ::: # Revoke temporary security credentials assigned to the task or container instance if necessary ## Revoke IAM Roles if Necessary If the IAM role assigned to the the task or EC2 is compromised, revoke those roles to prevent further access to AWS resources. Ensure this doesn't impact other workloads. ```bash= # Replace 'YourTaskARN' with the ARN of the specific task task_arn="YourTaskARN" # Describe the task to get the TaskRole ARN task_description=$(aws ecs describe-tasks --tasks $task_arn --query "tasks[0]" --output json) task_role_arn=$(echo $task_description | jq -r ".taskDefinitionArn") # Add a tag to the IAM policies associated with the TaskRole aws iam tag-role --role-name $(basename $task_role_arn) --tags Key=forensic,Value=true # Detach the policies from the TaskRole policy_arns=$(aws iam list-attached-role-policies --role-name $(basename $task_role_arn) --query "AttachedPolicies[].PolicyArn" --output text) for policy_arn in $policy_arns; do aws iam detach-role-policy --role-name $(basename $task_role_arn) --policy-arn $policy_arn done ``` ***<AND/OR should we just do this below. Clarify with Security resource?>*** https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_revoke-sessions.html#revoke-session # Prevent Task Scheduling :::info :warning:This guidance is not applicable to Fargate where each Fargate pod run in its own sandboxed environment. Instead of cordoning, sequester the affected Fargate pods by applying a network policy that denies all ingress and egress traffic. ::: To prevent a task from being scheduled in Amazon Elastic Container Service (ECS), you can use the "placement constraints." Placement constraints allow you to specify rules that control where tasks are placed in your ECS cluster. If a task cannot satisfy the placement constraint, it won't be scheduled. Allowing you to maintain the integrity of your investigation and avoid disruptions to other tasks To prevent a specific task from being scheduled, you can create a placement constraint that targets the task's specific attribute. Here's how you can do it using the AWS CLI: 1. Identify a unique attribute for the task that can be used as a constraint. For example, you can use a custom attribute, a tag, or the task definition's revision. 2. Create a placement constraint that specifies that this unique attribute must not match the desired value. In this case, you want to ensure the task doesn't have a specific tag, for example. Here's an example of how you can create a placement constraint to prevent tasks with a specific tag from being scheduled: ```bash= # Replace 'YourClusterName' with the name of your ECS cluster # Replace 'YourUniqueTag' with the unique tag that identifies the task to exclude cluster_name="YourClusterName" unique_tag="YourUniqueTag" # Create a placement constraint that excludes tasks with the specific tag aws ecs put-cluster-placement-constraints --cluster $cluster_name --placement-constraints '[{"type": "memberOf", "expression": "attribute:ecs.tags != 'forensic'"}]' ``` In this example, we use the `put-cluster-placement-constraints` command to specify the placement constraint. We use the `memberOf` constraint type to ensure that tasks with the specific tag (e.g., 'forensic') are not scheduled. By applying this placement constraint, you prevent tasks with the specified tag from being scheduled on the cluster. This effectively prevents the task from being scheduled, allowing you to exclude it from normal operation, for example, during a forensic investigation. ## Protect Against Termination To prevent the attacker from terminating the compromised EC2 instance, enable termination protection and scale-in protection for the affected instance. :::warning :warning:You cannot enable termination protection on a Spot instance. ::: To prevent an Amazon ECS instance from being terminated, you can enable termination protection on the EC2 instance level. This ensures that the instance cannot be terminated through the normal termination process. Please note that termination protection is set at the EC2 instance level rather than the ECS level. Here's how you can do it using the AWS CLI: ```bash= # Replace 'YourInstanceId' with the ID of your ECS instance instance_id="YourInstanceId" # Enable termination protection for the EC2 instance aws ec2 modify-instance-attribute --instance-id $instance_id --no-disable-api-termination ``` In this script: 1. Replace `YourInstanceId` with the ID of the ECS instance (EC2 instance) that you want to protect from termination. 2. We use the `modify-instance-attribute` command to enable termination protection by specifying the `--no-disable-api-termination` flag. Enabling termination protection will prevent the instance from being terminated through the AWS Management Console, AWS CLI, or other AWS APIs. This protection applies at the EC2 instance level. Keep in mind that enabling termination protection is a security measure and should be used with caution. Be sure that it's necessary and aligns with your operational and security requirements, as it may affect your ability to manage and terminate the instance in the future. To remove termination protection, you can use the `--disable-api-termination` flag in the `modify-instance-attribute` command. # Collect Forensic Artifacts Capture relevant artifacts, such as operating system memory, processes, and network states on the affected EC2 instance for later forensic analysis. Collect the memory of the operating system, including the Docker daemon (or any other container runtime) and its associated subprocesses per container. This can be achieved using tools such as [LiME](https://github.com/504ensicsLabs/LiME) and [Volatility](https://www.volatilityfoundation.org/), or through more advanced solutions like the [Automated Forensics Orchestrator for Amazon EC2](https://aws.amazon.com/solutions/implementations/automated-forensics-orchestrator-for-amazon-ec2/), which leverage these tools for comprehensive memory capture and analysis. Perform a comprehensive data collection process, including: 1. **Collect the memory of the operating system**: Include the Docker daemon (or any other container runtime) and its associated subprocesses per container. This can be achieved using tools such as [LiME](https://github.com/504ensicsLabs/LiME) and [Volatility](https://www.volatilityfoundation.org/), or through more advanced solutions like the [Automated Forensics Orchestrator for Amazon EC2](https://aws.amazon.com/solutions/implementations/automated-forensics-orchestrator-for-amazon-ec2/), which leverage these tools for comprehensive memory capture and analysis. 1. **Capture Network State and Processes**: - Conduct a netstat tree dump to record details about running processes and open ports. This step ensures the capture of the Docker daemon and its associated subprocesses per container. 2. **Save Container-Level State**: - Preserve the state of containers before any evidence is altered. Utilize container runtime capabilities to extract information about currently running containers. For example, if you're using Docker: - Run `docker top CONTAINER` to view processes running within the container. - Use `docker logs CONTAINER` to access daemon-level logs. - Execute `docker inspect CONTAINER` to retrieve various details about the container. - Similar actions can be achieved with containerd using the `nerdctl` CLI instead of Docker (e.g., `nerdctl inspect`). Depending on the container runtime, additional commands like `docker diff` for filesystem changes or `docker checkpoint` for saving container states, including volatile memory (RAM), may be available. 3. **Pause the Container for Forensic Capture**: - Temporarily halt the container to facilitate forensic data capture. 4. **Snapshot the Instance's EBS Volumes**: - Take snapshots of the Elastic Block Store (EBS) volumes associated with the instance to ensure comprehensive data preservation. This sequence of actions is vital for forensic data collection and analysis, aiding in the investigation of security incidents while maintaining data integrity. ***<What tools do we add here?>*** ## Redeploy Compromised Task or Workload After completing the forensic analysis, consider redeploying the compromised task or workload. Ensure that any vulnerabilities are addressed before doing so. # Recommendations * Review the AWS Security Incident Response Whitepaper to gain comprehensive insights into handling security breaches on Amazon ECS. * Practice security exercises, such as game days, to enhance your team's response to security incidents. Simulate both offensive (red team) and defensive (blue team) scenarios to improve your incident response capabilities. * Refer to the ECS Best Practice guidance for security. * Conduct penetration tests against your ECS clusters periodically to discover vulnerabilities and misconfigurations. Follow AWS guidelines for penetration testing before conducting tests. * Use tools like AWS System Manager to create automation documents for both stopping the effected task and also collecting forensic information using forensic tools like LiME, Volatility, etc.