App Mesh on Amazon ECS - Part 1

Part 2 of this guide can be found here.

AWS App Mesh is a service mesh solution that provides a number of benefits for microservices.

Standardized communication protocols invisible to applications within container groupings.
Decoupled observability solutions from application-level implementation.
High visibility into service-to-service communication for easier debugging and troubleshooting.

Despite all of these benefits, App Mesh can prove to be rather finicky to get started with, and a fair portion of the existing learning materials for ECS and EKS tend to gloss over a number of important conceptual details and either leave the base application implementation up to the user, or jump straight to a fully-fledged architecture, making for a somewhat unstable introduction to the world of service meshes. This tends to complicate further App Mesh topics and has the potential to discourage users from feeling motivated to further explore these complex concepts, such as TLS configuration and observability routing, which can further improve application security and visibility respectively.

In this tutorial, I'll aim to provide an in-depth introduction to App Mesh on Amazon ECS and explain the core concepts of this service from the ground up. We'll start with a simple mesh and cover several topics in the following order:

Configuring the initial infrastructure.
Configuring the simplest mesh - just 2 virtual nodes.
Incorporating a virtual router fronting 2 more virtual nodes.
Incorporating a virtual gateway to allow external users to access specific microservices within the mesh.
Securing service mesh communication by incorporating TLS authentication.
Incorporating the AWS Distro for OpenTelemetry to route observability signals to other AWS services.

Part 1 of the guide (the part you're currently reading) covers topics #1 to #4, which provide an in-depth introduction to core App Mesh concepts. Part 2 contains topics #5 onwards and covers more advanced concepts.

Throughout this tutorial, we'll be making use of the forsakenidol/colorapp DockerHub image as our microservice of choice for demonstration purposes. This image was inspired by a build of a very simple web application inspired by subfuzion/colorteller, but also incorporates a shell with curl and nslookup for testing purposes (which subfuzion/colorteller lacks). The colorapp image has the following properties:

It runs on port 80 by default, but you can map this port to any value in ECS. We'll be using the default port 80 in this tutorial.
It accepts a COLOR environment variable to which you can pass a string.
It only has 2 paths - a root path / which simply returns the value of COLOR, and a /healthcheck path which returns a 200 status code and the word hi.

Feel free to pull the image onto your local machine to experiment with it.

Prerequisites

You'll need the following knowledge to fully understand this tutorial:

An understanding of DNS in the context of AWS Cloud Map.
A basic understanding of what a service mesh is - we'll discuss service mesh components in more depth in this tutorial, but you should at least be familiar with the concept of a service mesh.

Should you wish to follow along with this tutorial, you'll need the following resources.

A VPC with NAT-translated private subnets, and an ECS cluster configured in your AWS account. You don't need any container instances, since our architecture will be fully Fargate-compatible.
- Make sure your VPC has the Enable DNS hostnames and Enable DNS support settings enabled (checkbox ticked if you're in the AWS console). This is a requirement for using Cloud Map, which we will be making extensive use of in this tutorial.
The AWS CLI v2 installed on your development machine, should you wish to follow along with this tutorial.

Throughout this tutorial, we'll incorporate the AWS CLI as much as possible to adopt an infrastructure-as-code approach to architectural configuration. We'll be storing a lot of our resource configs in JSON files that you'll be able to push to an upstream repository if you so wish.

Let's get started.

1. Setting up the Basic Infrastructure.

Before we can configure virtual nodes, we must first set up the key components of our infrastructure required to support those virtual nodes. The components we'll be configuring in this section are the DNS namespace for our services, and the parent service mesh in App Mesh.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

We'll be using Cloud Map to manage DNS records within our mesh. This is the upstream service ECS prefers when configuring service discovery. Let's start by creating a Cloud Map namespace - create the first file 01a-namespace.json and pass it to the following AWS CLI command. Replace <vpc-id> with the ID of your pre-existing VPC. Because we're creating a private namespace that will only be resolvable within our VPC, Cloud Map service records for this namespace will be tied to that VPC, and we'll launch all of our resources into that VPC as well.






// 01a-namespace.json
{
    "Name": "colors.local",
    "Description": "Namespace for the color mesh.",
    "Vpc": "<vpc-id>"
}

(Note: The following command returns an OperationId that you can use to track the status of the namespace's creation operation, since it will complete asynchronously to the command. To track the status of the creation process, pass the OperationId to aws servicediscovery get-operation --operation-id.)

aws servicediscovery create-private-dns-namespace --cli-input-json file://01a-namespace.json

Note: We'll be creating quite a fair number of JSON manifests for our infrastructure in this tutorial. Some manifests have a dependency on specific resources being created, or there may be a logical order in which resources should be created. A good way to remember the order in which those manifests are created is to prefix the manifest name with its applied order, which we'll be doing throughout this tutorial.

When we proceed to register services within this namespace, those services will be resolvable within the VPC under the colors.local suffix. For example, if we register a service called red, the FDQN for that service will be red.colors.local, etc. for any other service we register into this namespace. The default TTL of SOA records in Cloud Map is 15 seconds, which is sufficient for our purposes - we won't be managing SOA records in this tutorial.

Let's go ahead and register 2 Cloud Map services for use in the next section - red and blue. You can fetch your <cloud-map-namespace-id> from the namespace view in the Cloud Map service.















// 01b-red.json
{
    "Name": "red",
    "NamespaceId": "<cloud-map-namespace-id>",
    "Description": "Red color service.",
    "DnsConfig": {
        "RoutingPolicy": "MULTIVALUE",
        "DnsRecords": [
            {
                "Type": "A",
                "TTL": 300
            }
        ]
    }
}















// 01b-blue.json
{
    "Name": "blue",
    "NamespaceId": "<cloud-map-namespace-id>",
    "Description": "Blue color service.",
    "DnsConfig": {
        "RoutingPolicy": "MULTIVALUE",
        "DnsRecords": [
            {
                "Type": "A",
                "TTL": 300
            }
        ]
    }
}

Note: If you're wondering how these JSON manifests should be structured, you can append --generate-cli-skeleton to any AWS CLI module's command to create a skeleton of the file you need to pass to --cli-input-json.

Then, apply the 2 files to our namespace. Whenever I use the same letter prefix for files, you can apply files with the same prefix in any order.


aws servicediscovery create-service --cli-input-json file://01b-red.json
aws servicediscovery create-service --cli-input-json file://01b-blue.json

At this point in time, make note of the service's Id and Arn fields in the output of the create-service command - we'll be using them in a later section. Later on in this tutorial, red and blue will be served directly by corresponding virtual services in App Mesh. We now have 2 Cloud Map services at our disposal, and their FDQNs are provided below:

red.colors.local
blue.colors.local

This is all we'll be doing at this point with Cloud Map in our infrastructure configuration, so let's now move over to the App Mesh service console and create a service mesh, which we'll begin to populate in section #2.









// 01c-mesh.json
{
    "meshName": "color-mesh",
    "spec": {
        "egressFilter": {
            "type": "DROP_ALL"
        }
    }
}

aws appmesh create-mesh --cli-input-json file://01c-mesh.json

An Egress filter of DROP_ALL means that any traffic that…

Is visible to Envoy,
Originated from a virtual node in our color-mesh service mesh, and
Isn't destined for a service in our mesh,

… will not be allowed to leave the mesh.

Congratulations! You've set up the DNS records for our services and created the service mesh in App Mesh. Let's move onto section #2 and start populating this mesh with virtual components and implementing those components in ECS.

2. App Mesh: The First 2 Virtual Nodes

App Mesh virtual nodes involve an application, which can consist of one or more containers, and an Envoy proxy sidecar, which is used to intercept, route, and / or drop traffic emitted by the application. A single virtual node can have one or more backends, which are other applications in the mesh that this virtual node is allowed to communicate with. Communication can be asymmetrical, i.e. application A may have application B configured as a backend, allowing A to initiate communication with B, but B may not necessarily have A as a backend, preventing B from initiating communiation with A. If this sounds confusing, don't worry - we're going to explore this configuration further in this section!

There's 2 parts to a virtual node in App Mesh:

The node's specification in App Mesh, including configuration parameters such as open ports, access logging, service discovery, and allowed backends.
The node's implementation in an upstream service such as ECS (the focus of this tutorial), EKS, or EC2.

The implementation is done by the Envoy proxy sidecar, which, when given the correct permissions and configuration variable(s), can read the App Mesh specification of the node it has to implement, and configure itself accordingly to match that specification. We declare the specification in the App Mesh service, then tell the Envoy proxy to read and implement that specification.

Note: Are you familiar with Kubernetes? This is not too dissimilar from the declarative configuration that Kubernetes uses, where we specify the desired state of an object, and Kubernetes uses its upstream implementation to create and configure resources that match our desired specification. The only difference with App Mesh is that we need to explicitly create the Envoy proxy that will be used to implement our virtual resources, before Envoy handles its own auto-configuration. App Mesh will not automatically create the proxy for us, but once created and give an App Mesh resource, it will automatically configure itself as that resource.

Declaring the Architecture

The simplest architecture for a service mesh involves just 2 virtual nodes. Let's start by configuring those using our red and blue Cloud Map services we created in section #1. We'll start with the red virtual node.































// 02a-red-node.json
{
    "virtualNodeName": "red",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ],
        "serviceDiscovery": {
            "awsCloudMap": {
                "namespaceName": "colors.local",
                "serviceName": "red"
            }
        },
        "logging": {
            "accessLog": {
                "file": {
                    "format": {
                        "text": "[%START_TIME%] %REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL% %RESPONSE_CODE%\n"
                    },
                    "path": "/dev/stdout"
                }
            }
        }
    }
}

aws appmesh create-virtual-node --cli-input-json file://02a-red-node.json

What did we just create?

Our red virtual node will have a single open port - 80 - which corresponds to the aforementioned colorapp port when this node is implemented.
The virtual node will be reachable by the red.colors.local FDQN. This corresponds to the red service in the colors.local Cloud Map namespace - remember this for later!
The Envoy proxy implementing this virtual node will log its access patterns in a specific text format to the stdout stream. This format is pieced together using supported format strings in the Envoy documentation.

Let's front that virtual node with a virtual service.












// 02b-red-service.json
{
    "virtualServiceName": "red.colors.local",
    "meshName": "color-mesh",
    "spec": {
        "provider": {
            "virtualNode": {
                "virtualNodeName": "red"
            }
        }
    }
}

aws appmesh create-virtual-service --cli-input-json file://02b-red-service.json

Virtual nodes are not served directly - they either sit directly behind a virtual service (as in this scenario), or behind a virtual router, which in turn sits behind a virtual service (which we'll explore later in this tutorial). A virtual service cannot serve traffic unless a provider is specified - in this case, the provider is the red virtual node we created previously.

That's our red node - now, let's create a blue node.



























































// 02c-blue-node.json
{
    "virtualNodeName": "blue",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ],
        "serviceDiscovery": {
            "awsCloudMap": {
                "namespaceName": "colors.local",
                "serviceName": "blue"
            }
        },
        "logging": {
            "accessLog": {
                "file": {
                    "format": {
                        "json": [
                            {
                                "key": "start_time",
                                "value": "%START_TIME%"
                            },
                            {
                                "key": "method",
                                "value": "%REQ(:METHOD)%"
                            },
                            {
                                "key": "request_path",
                                "value": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
                            },
                            {
                                "key": "protocol",
                                "value": "%PROTOCOL%"
                            },
                            {
                                "key": "response_code",
                                "value": "%RESPONSE_CODE%"
                            }
                        ]
                    },
                    "path": "/dev/stdout"
                }
            }
        },
        "backends": [
            {
                "virtualService": {
                    "virtualServiceName": "red.colors.local"
                }
            }
        ]
    }
}

aws appmesh create-virtual-node --cli-input-json file://02c-blue-node.json

This looks rather different to our red-node.json file - what's changed here?

Our blue virtual node has the red.colors.local virtual service listed as a backend. This means that blue will be able to contact red through the red.colors.local virtual service, but not vice versa, since we didn't configure red's backends to include the blue application.
Envoy will log in a different format once it implements the blue node. Here, we're telling it to log in the json format, albeit with the same 5 format strings as the red virtual node's text logging configuration.

Let's also configure a virtual service for our blue application.












// 02d-blue-service.json
{
    "virtualServiceName": "blue.colors.local",
    "meshName": "color-mesh",
    "spec": {
        "provider": {
            "virtualNode": {
                "virtualNodeName": "blue"
            }
        }
    }
}

aws appmesh create-virtual-service --cli-input-json file://02d-blue-service.json

Now, return to the top-level of our working directory and let's briefly review the infrastructure we've just configured.

We have 2 virtual nodes - a red node and a blue node.

Each virtual node is served by a virtual service - red.colors.local and blue.colors.local. We must give a virtual service the same name that meshed applications will use to query it - here, that's the FDQN as per the Cloud Map service.

blue will be able to contact red through red.colors.local because the relevant service has been configured as a backend. However, the reverse is not valid - red cannot query blue.colors.local because red has no virtual service backends.

Implementing the Architecture

Now, let's implement these 2 virtual nodes in ECS. Before we can do this, however, there's a few things we need to configure.

We'll start by creating a task role and a task execution role for our ECS tasks.

The task role requires 2 sets of permissions - the AWS-managed AWSAppMeshEnvoyAccess IAM policy to provide appmesh:StreamAggregatedResources permissions for our Envoy proxy to fetch virtual node information from App Mesh, and another policy that will allow us to execute ECS Exec commands on our tasks for testing purposes (these are the required permissions for ECS Exec). Notably, the AWS-managed policy for Envoy provides permissions for a single Envoy container to fetch any virtual resource information from App Mesh. In a small mesh of our size with only 4 virtual nodes (eventually), this isn't a big problem, but for bigger production meshes, you may want to consider duplicating this policy and scoping it down to a single policy for each virtual resource.

The task execution role requires the standard AmazonECSTaskExecutionRolePolicy AWS-managed IAM policy, but also an additional inline policy that provides logs:CreateLogGroup permissions if a CloudWatch log group has not yet been created for our task (this permission is not provided in AmazonECSTaskExecutionRolePolicy as of the time of writing). The latter permission will allow ECS to create the log group for this task if it does not already exist.

We'll also configure a few more miscellaneous resources.

Create a security group for our task. We'll need to specify one for our tasks since we're using the awsvpc network mode in Fargate - ECS will create the Fargate task ENIs in our VPC, and we need to attach security group(s) to these ENIs to interact with them. For our purposes, we can create a self-referencing security group, which allows any traffic. as long as the traffic originates from a resource which also has the same security group attached. This works for our means because we'll only be attaching these security groups to our meshed applications. Let's call this security group colorapp-sg.

Get the ARNs of the red and blue Cloud Map services in our colors.local namespace - we'll need to specify these in a few documents later down the line. The service ID should be of the form srv-xxxxxxxxxx, and you can find this in the Cloud Map console - you'll need to run aws servicediscovery get-service --id srv-xxxxxxxxxx to get the service ARN in the output of this command. Run this twice for each of the red and blue service IDs in Cloud Map.























$ aws servicediscovery get-service --id <color-service-id>
{
    "Service": {
        "Id": "<color-service-id>",
        "Arn": "arn:aws:servicediscovery:eu-west-1:<account-id>:service/<color-service-id>",
        "Name": "red",
        "NamespaceId": "<colors-namespace>",
        "Description": "<color> color service.",
        "DnsConfig": {
            "NamespaceId": "<colors-namespace>",
            "RoutingPolicy": "MULTIVALUE",
            "DnsRecords": [
                {
                    "Type": "A",
                    "TTL": 300
                }
            ]
        },
        "Type": "DNS_HTTP",
        "CreateDate": "<date-service-was-created>",
        "CreatorRequestId": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
}

Now, we can configure our task definition for the meshed applications. Let's start with the task definition for the red virtual node implementation. I'll show you what the JSON file should look like, then we'll break down some of the notable components in the file.
















































































































// 02e-red-td.json
{
    "family": "red_node",
    "taskRoleArn": "<task-role-arn>",
    "executionRoleArn": "<task-execution-role-arn>",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {
            "name": "red-app",
            "image": "forsakenidol/colorapp",
            "cpu": 128,
            "memory": 128,
            "portMappings": [
                {
                    "containerPort": 80,
                    "protocol": "tcp",
                    "name": "color-port"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "COLOR",
                    "value": "red"
                }
            ],
            "dependsOn": [
                {
                    "containerName": "envoy-sidecar",
                    "condition": "HEALTHY"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "color-mesh",
                    "awslogs-region": "<your-region>",
                    "awslogs-create-group": "true",
                    "awslogs-stream-prefix": "red-app-container"
                }
            },
            "healthCheck": {
                "command": [ "CMD-SHELL", "curl -f http://localhost/health/ || exit 1" ],
                "interval": 5,
                "timeout": 5,
                "retries": 3
            }
        },
        {         
            "name" : "envoy-sidecar",
            "image" : "840364872350.dkr.ecr.ap-southeast-2.amazonaws.com/aws-appmesh-envoy:v1.27.0.0-prod",
            "cpu": 128,
            "memory": 384,
            "essential" : true,
            "environment" : [
                {
                    "name" : "APPMESH_RESOURCE_ARN",
                    "value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualNode/red"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "color-mesh",
                    "awslogs-region": "<your-region>",
                    "awslogs-create-group": "true",
                    "awslogs-stream-prefix": "red-envoy-sidecar"
                }
            },
            "startTimeout": 30,
            "healthCheck" : {
                "command" : [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ],
                "interval" : 5,
                "retries" : 3,
                "startPeriod" : 10,
                "timeout" : 2
            },
            "user" : "1337"
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "256",
    "memory": "512",
    "proxyConfiguration": {
        "type": "APPMESH",
        "containerName": "envoy-sidecar",
        "properties": [
            {
                "name": "IgnoredUID",
                "value": "1337"
            },
            {
                "name": "AppPorts",
                "value": "80"
            },
            {
                "name": "ProxyIngressPort",
                "value": "15000"
            },
            {
                "name": "ProxyEgressPort",
                "value": "15001"
            },
            {
                "name": "EgressIgnoredIPs",
                "value": "169.254.170.2,169.254.169.254"
            }
        ]
    }
}

Make sure to add your task role ARN, task execution role ARN, region, and account ID where the placeholders are for each of these values.
To ensure that traffic to and from our application container must pass through Envoy, we configure a startup ordering. Envoy must become healthy before our application container starts, and we configure this using the dependsOn task definition parameter.
We follow the instructions in step #6 of this guide to configure the Envoy container. Bear in mind that the version of the Envoy image may be different at the time of writing to when you're using this image.
Note the "user": "1337" specification, which allows us to specify the user ID of the main process within the Envoy sidecar. This allows us to explicitly choose the user ID, which we can then specify that App Mesh should ignore via the IgnoredUID property of the proxyConfiguration to prevent App Mesh from trying to proxy its own traffic.
The APPMESH_RESOURCE_ARN environment variable tells the Envoy sidecar which App Mesh resource it needs to implement. Here, it's trying to implement the red virtual node.

You may also notice that despite application traffic being proxied through Envoy, that we're actually opening the task port on the red-app container, instead of on the envoy-sidecar container. This is because the Envoy sidecar, despite being labelled as a proxy, actually behaves slightly differently than traditional network proxies you may already be familiar with. This difference manifests in the way the proxy handles traffic going in and out of the task that it intercepts. To put it simply:

For incoming traffic to the task:

Incoming traffic hits the port exposed on the application container.
Before the application container actually receives that traffic, IP table rules intercept the traffic and re-route it to Envoy.
Envoy processes the traffic, then sends it back to the application container.

For traffic leaving the container:

Outgoing traffic is sent by the application container.
Before traffic leaves the task, IP table rules intercept the traffic and send it to Envoy.
Envoy processes the traffic, and if it is taking a valid path through the mesh (or to an out-of-mesh endpoint if allowed), puts it back in the original location before it was intercepted to proceed out of the task.
If the traffic is not taking a valid path through the mesh (e.g. trying to reach a service not configured as a backend for this node), Envoy stops the traffic instead.

Let's also configure the task definition for the blue virtual node implementation.
















































































































// 02e-blue-td.json
{
    "family": "blue_node",
    "taskRoleArn": "<task-role-arn>",
    "executionRoleArn": "<task-execution-role-arn>",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {
            "name": "blue-app",
            "image": "forsakenidol/colorapp",
            "cpu": 128,
            "memory": 128,
            "portMappings": [
                {
                    "containerPort": 80,
                    "protocol": "tcp",
                    "name": "color-port"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "COLOR",
                    "value": "blue"
                }
            ],
            "dependsOn": [
                {
                    "containerName": "envoy-sidecar",
                    "condition": "HEALTHY"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "color-mesh",
                    "awslogs-region": "<your-region>",
                    "awslogs-create-group": "true",
                    "awslogs-stream-prefix": "blue-app-container"
                }
            },
            "healthCheck": {
                "command": [ "CMD-SHELL", "curl -f http://localhost/health/ || exit 1" ],
                "interval": 5,
                "timeout": 5,
                "retries": 3
            }
        },
        {         
            "name" : "envoy-sidecar",
            "image" : "840364872350.dkr.ecr.ap-southeast-2.amazonaws.com/aws-appmesh-envoy:v1.27.0.0-prod",
            "cpu": 128,
            "memory": 384,
            "essential" : true,
            "environment" : [
                {
                    "name" : "APPMESH_RESOURCE_ARN",
                    "value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualNode/blue"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "color-mesh",
                    "awslogs-region": "<your-region>",
                    "awslogs-create-group": "true",
                    "awslogs-stream-prefix": "blue-envoy-sidecar"
                }
            },
            "startTimeout": 30,
            "healthCheck" : {
                "command" : [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ],
                "interval" : 5,
                "retries" : 3,
                "startPeriod" : 10,
                "timeout" : 2
            },
            "user" : "1337"
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "256",
    "memory": "512",
    "proxyConfiguration": {
        "type": "APPMESH",
        "containerName": "envoy-sidecar",
        "properties": [
            {
                "name": "IgnoredUID",
                "value": "1337"
            },
            {
                "name": "AppPorts",
                "value": "80"
            },
            {
                "name": "ProxyIngressPort",
                "value": "15000"
            },
            {
                "name": "ProxyEgressPort",
                "value": "15001"
            },
            {
                "name": "EgressIgnoredIPs",
                "value": "169.254.170.2,169.254.169.254"
            }
        ]
    }
}

Observe the near-identicality of both task definitions with the exception of the parameters which pertain to the color being served and the name of the main application container.

Let's register these task definitions.

aws ecs register-task-definition --cli-input-json file://02e-red-td.json
aws ecs register-task-definition --cli-input-json file://02e-blue-td.json

Launching the Implementation

Now, we can create services from our task definitions to actually launch the virtual node implementations into our ECS cluster. Create the service JSON specifications for each of the red and blue services as follows.






















// 02f-red-service.json
{
    "cluster": "<ecs-cluster-name>",
    "serviceName": "red-svc",
    "taskDefinition": "red_node",
    "serviceRegistries": [
        { "registryArn": "<your-red-service-cloud-map-arn>" }
    ],
    "desiredCount": 1,
    "launchType": "FARGATE",
    "platformVersion": "LATEST",
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [ "<list-of-desired-private-subnets-in-string-form>" ],
            "securityGroups": [ "<your-color-app-security-group>" ],
            "assignPublicIp": "DISABLED"
        }
    },
    "schedulingStrategy": "REPLICA",
    "deploymentController": { "type": "ECS" },
    "enableExecuteCommand": true
}






















// 02f-blue-service.json
{
    "cluster": "<ecs-cluster-name>",
    "serviceName": "blue-svc",
    "taskDefinition": "blue_node",
    "serviceRegistries": [
        { "registryArn": "<your-blue-service-cloud-map-arn>" }
    ],
    "desiredCount": 1,
    "launchType": "FARGATE",
    "platformVersion": "LATEST",
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [ "<list-of-desired-private-subnets-in-string-form>" ],
            "securityGroups": [ "<your-color-app-security-group>" ],
            "assignPublicIp": "DISABLED"
        }
    },
    "schedulingStrategy": "REPLICA",
    "deploymentController": { "type": "ECS" },
    "enableExecuteCommand": true
}

A few things to note right off the bat:

Remember the ARNs we fetched of the red and blue Cloud Map services in our colors.local namespace? We need to specify them in the service definitions above as values for the registryArn keys.
Launch all resources into private subnets - in the interests of security, you should never launch a service (in App Mesh or otherwise) into public subnets. Even if you're making the service publicly accessible, this is generally done through a gateway mechanism, such as an Elastic Load Balancer, the listeners for which will be the ones in the public subnets - not the application itself (we're not fronting red or blue with load balancers, but this is just a good tip to keep in mind).

Let's create these services.


aws ecs create-service --cli-input-json file://02f-red-service.json
aws ecs create-service --cli-input-json file://02f-blue-service.json

Testing the Implementation

Congratulations! You've launched the first 2 virtual nodes into your mesh! That being said, how do we test virtual-resource-to-virtual-resource connectivity within the mesh? That's where ECS Exec comes into play for our test architecture. We'll open a shell into each virtual node's container and use curl to simulate a connection to the other virtual service, in lieu of another application connection such as a database or microservice connection.

In the ECS console, first, grab the task ID for your blue service, then run the following command.





aws ecs execute-command --cluster <cluster-name> \
    --task <blue-task-id> \
    --container blue-app \
    --interactive \
    --command "/bin/sh"

(If you're on Windows, and you run into a command-not-found issue with /bin/sh , try changing this to just sh.)

If ECS Exec has been correctly configured, you will be at a shell. Execute the following 3 commands:

curl localhost - This should print blue, since we're serving the blue application from this container.
nslookup red.colors.local - This should give you the private IP address of the red ECS task.
Finally, curl red.colors.local - This should print red, because the red.colors.local virtual service was configured as a backend to our blue virtual node.

Now, exit out of the blue shell, grab the task ID for your red service, then run the following command.





aws ecs execute-command --cluster <cluster-name> \
    --task <red-task-id> \
    --container red-app \
    --interactive \
    --command "/bin/sh"

Execute the following 4 commands.

curl localhost - This should print red, since we're serving the red application from this container.
nslookup blue.colors.local - This should give you the private IP address of the blue ECS task.
Finally, curl blue.colors.local - This should fail, typically with a recv error or an empty reply from the server, because the red virtual node does not have any virtual service backends.
while true; do curl blue.colors.local; sleep 1; done;. This should put you into a loop where the curl command continuously fails. Leave this running in the shell for now and read the next steps below.

While that final command is running, navigate to the color-mesh in the App Mesh console, select the red virtual node, click on Edit in the top right-hand corner, and scroll down to Service backends - recommended. This is where we add backends to virtual nodes in the console. Expand this section, select Add backend, and specify the blue.colors.local virtual service name. Leave all other values at their defaults and select Save at the bottom of the screen.

Within 1 minute or less, the output of command loop #4 above should start printing blue instead of the persistent error you've previously encountered, since we've now added the blue.colors.local virtual service as a backend to our red virtual node. The reason why this doesn't happen instantly is because Envoy needs time to retrieve the updated configuration from the App Mesh service. Now, remove the virtual service backend we just added (so that the red virtual node has no backends), and watch the command loop return to spitting out the same error as before.

Congratulations! You've successfully configured the first 2 virtual nodes and virtual services in your mesh, and demonstrated asymmetrical virtual node communication between the red and blue services in your service mesh.

Aside: Envoy Metrics

The Envoy container, within each of our tasks, exposes an administration interface on port 9901. We can curl localhost:9901 from within the application container to get some key information about the execution of the Envoy container. Here are a few key metrics to be aware of:

curl localhost:9901/server_info: Information about the currently-running instance of Envoy in this task.
curl localhost:9901/ready: Prints the running state of the Envoy sidecar. This is particularly useful as a health check for the container.
curl localhost:9901/stats | grep connected: Prints the control_plane.connected_state variable, which is set to 1 if the sidecar is connected to App Mesh. Useful for determining issues with the task's network connection.
curl localhost:9901/stats/prometheus: Exposes Envoy container statistics in a Prometheus-compatible format, which can be plugged into an upstream Prometheus scraper.

You can find a full list of supported endpoints in the Envoy documentation. Envoy exposes a huge number of metrics that can lend insight into the operation of the proxy and downstream application in the same task, fulfilling the metrics pillar of observability for meshed applications. In a later section, we'll have a look at how we can send these metrics to an upstream observability backend using OpenTelemetry.

3. The Virtual Router

Now, we'll introduce a virtual router to our mesh, and with it, the final 2 virtual nodes - green and yellow. Virtual routers cannot route traffic to other virtual services - they can only send traffic to virtual nodes.

Green and Yellow

Before we can configure the virtual nodes, we first need to set up the Cloud Map services for both of these nodes in the colors.local namespace.

We'll take a similar approach to when we configured the red and blue services in our manifests. We'll start with the first namespace…















// 03a-router.json
{
    "Name": "router",
    "NamespaceId": "<cloud-map-namespace-id>",
    "Description": "Green color service, and DNS record for multiple colors.",
    "DnsConfig": {
        "RoutingPolicy": "MULTIVALUE",
        "DnsRecords": [
            {
                "Type": "A",
                "TTL": 300
            }
        ]
    }
}


aws servicediscovery create-service --cli-input-json file://03a-router.json

… then move onto the second.















// 03a-yellow.json
{
    "Name": "yellow",
    "NamespaceId": "<cloud-map-namespace-id>",
    "Description": "Yellow color service.",
    "DnsConfig": {
        "RoutingPolicy": "MULTIVALUE",
        "DnsRecords": [
            {
                "Type": "A",
                "TTL": 300
            }
        ]
    }
}


aws servicediscovery create-service --cli-input-json file://03a-yellow.json

At this point, you're probably wondering - why did I call the Cloud Map service corresponding to the green node router instead of calling it green? Hold onto that question, because we're going to explain it in just a bit. (You may be able to glean the answer from the service's description, but if you're still unsure, don't worry! We'll cover this after we configure the virtual router.)

Now, we can set up the green and yellow virtual nodes. Once again, notice the similarities between the virtual node specifications, first for the green virtual node…




























// 03b-green-node.json
{
    "virtualNodeName": "green",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ],
        "serviceDiscovery": {
            "awsCloudMap": {
                "namespaceName": "colors.local",
                "serviceName": "router"
            }
        },
        "logging": {
            "accessLog": {
                "file": {
                    "path": "/dev/stdout"
                }
            }
        }
    }
}


aws appmesh create-virtual-node --cli-input-json file://03b-green-node.json

… then for the yellow virtual node.




























// 03b-yellow-node.json
{
    "virtualNodeName": "yellow",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ],
        "serviceDiscovery": {
            "awsCloudMap": {
                "namespaceName": "colors.local",
                "serviceName": "yellow"
            }
        },
        "logging": {
            "accessLog": {
                "file": {
                    "path": "/dev/stdout"
                }
            }
        }
    }
}


aws appmesh create-virtual-node --cli-input-json file://03b-yellow-node.json

Note that we haven't specified a custom logging string for these virtual nodes - they will inherit the default format string in the Envoy documentation.

Now let's move on to ECS. Instead of pasting the task definition in its entirety as was done previously with the red and blue nodes, I'll instead highlight the sections of the task definition you should change, since the task definitions for all 4 nodes are very similar.

family - This should be renamed to separate each task definition.

    "family": "green_node",

The name of the main application container.

            "name": "green-app",

The environment variable COLOR for the main application container.

            "environment": [
                {
                    "name": "COLOR",
                    "value": "green"
                }
            ],

The names of the colorapp and envoy CloudWatch log streams. Here's an example for the colorapp container.

            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "color-mesh",
                    "awslogs-region": "<your-region>",
                    "awslogs-create-group": "true",
                    "awslogs-stream-prefix": "green-app-container"
                }
            },

The APPMESH_RESOURCE_ARN environment variable for the Envoy sidecar, so it knows which virtual node to implement.

            "environment" : [
                {
                    "name" : "APPMESH_RESOURCE_ARN",
                    "value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualNode/green"
                }
            ],

Once both task definitions (I've called them 03c-green-td.json and 03c-yellow-td.json) have been registered, we can create ECS services from them. First for the green service…























// 03d-green-service.json
{
    "cluster": "<cluster-name>",
    "serviceName": "green-svc",
    "taskDefinition": "green_node",
    "serviceRegistries": [
        { "registryArn": "<router-service-cloud-map-arn>" }
    ],
    "desiredCount": 1,
    "launchType": "FARGATE",
    "platformVersion": "LATEST",
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [ <subnets> ],
            "securityGroups": [ <colorapp-security-group> ],
            "assignPublicIp": "DISABLED"
        }
    },
    "schedulingStrategy": "REPLICA",
    "deploymentController": { "type": "ECS" },
    "enableExecuteCommand": true
}


aws ecs create-service --cli-input-json file://03d-green-service.json

… then for the yellow service.






















// 03d-yellow-service.json
{
    "cluster": "<cluster-name>",
    "serviceName": "yellow-svc",
    "taskDefinition": "yellow_node",
    "serviceRegistries": [
        { "registryArn": "<yellow-service-cloud-map-arn>" }
    ],
    "desiredCount": 1,
    "launchType": "FARGATE",
    "platformVersion": "LATEST",
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [ <subnets> ],
            "securityGroups": [ <colorapp-security-group> ],
            "assignPublicIp": "DISABLED"
        }
    },
    "schedulingStrategy": "REPLICA",
    "deploymentController": { "type": "ECS" },
    "enableExecuteCommand": true
}


aws ecs create-service --cli-input-json file://03d-yellow-service.json

At this point in time, we've introduced the green and yellow virtual nodes to our mesh. However, these nodes are isolated:

They have no backends, and so will not be able to reach other virtual services in the mesh.
They have not been configured as a backend for any of the other virtual nodes (and they do not have a virtual service in front of them for this to be possible).

When we configured the red and blue virtual nodes, we fronted them directly with corresponding virtual services red.colors.local and blue.colors.local. In this section, however, we're going to front the green and yellow virtual nodes with a color-router virtual router, which in turn is going to sit behind a virtual service router.colors.local. The color-router is going to route incoming traffic to the green and yellow virtual nodes.

Configuring the Virtual Router

As we mentioned previously, a virtual service can service traffic to virtual node backends in 2 different ways:

Directly to a virtual node, presenting a 1-to-1 connection between the virtual service and a virtual node.
To one or more virtual nodes by means of a virtual router, which sits betwen the virtual service and the pool of virtual nodes and allows for more advanced request routing to the nodes.

We've already explored the first method when we set up the red and blue virtual nodes and configured their corresponding virtual services to sit directly in front of those nodes. In this section, we're going to explore option #2.

We'll start by configuring the color-router virtual router in App Mesh…















// 03e-color-router.json
{
    "virtualRouterName": "color-router",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ]
    }
}

aws appmesh create-virtual-router --cli-input-json file://03e-color-router.json

… add a color-route to our mesh that will listen for incoming GET requests on port 80 and distribute traffic to our green and yellow virtual nodes at a 4:1 or 80/20 traffic spread respectively…






























// 03f-color-route.json
{
    "routeName": "color-route",
    "virtualRouterName": "color-router",
    "meshName": "color-mesh",
    "spec": {
        "priority": 1,
        "httpRoute": {
            "match": {
                "method": "GET",
                "port": 80,
                "prefix": "/"
            },
            "action": {
                "weightedTargets": [
                    {
                        "port": 80,
                        "virtualNode": "green",
                        "weight": 4
                    },
                    {
                        "port": 80,
                        "virtualNode": "yellow",
                        "weight": 1
                    }
                ]
            }
        }
    }
}

aws appmesh create-route --cli-input-json file://03f-color-route.json

… and finally, front the virtual router with a virtual service.












// 03g-color-service.json
{
    "virtualServiceName": "router.colors.local",
    "meshName": "color-mesh",
    "spec": {
        "provider": {
            "virtualRouter": {
                "virtualRouterName": "color-router"
            }
        }
    }
}

aws appmesh create-virtual-service --cli-input-json file://03g-color-service.json

Explaining FDQNs and Virtual Routers

At this point in time, we can discuss why I chose to name the green virtual node's Cloud Map service router instead of just calling it green. In truth, it actually doesn't matter what you name the Cloud Map service, as long as the naming is consistent with other resources further down the line. What does that mean?

Our virtual router, despite routing traffic to more than one virtual node, is still being fronted by a virtual service that needs an FDQN that is resolvable by virtual nodes within the service mesh. Because our service discovery mechanism is handled by the Cloud Map namespace colors.local, we'll need a Cloud Map service called router in this namespace, and that service will need at least 1 service instance entry. This is so that when you make a DNS request for router.colors.local (e.g. using nslookup, which the forsakenidol/colorapp image has access to) from the virtual node, one or more entries are returned.

Envoy, despite knowing where to route your traffic when you make a request to router.colors.local, is still at the mercy of the upstream DNS server to actually return a response to the virtual node from which the DNS request for router.colors.local was made. If the router Cloud Map service doesn't exist, the DNS request will fail, and if this happens, no HTTP request will be made, and Envoy will not have an HTTP request to intercept.

Note: Do not proxy DNS traffic on port 53 through Envoy. You might be tempted to do this to simplify virtual router configuration (and possibly avoid the aforementioned DNS record setup), but the Envoy proxy does not have DNS server functionality and will not know how to handle the incoming DNS traffic, least of all because we haven't defined mappings for port 53. This is one of the reasons why App Mesh service discovery is reliant on a upstream DNS service separate from Envoy (Cloud Map in our circumstances) to provide it with the actual targets addresses for a given FDQN.

A typical deployment pattern which is made simple through the use of a virtual router is covered in the official AWS App Mesh Workshop. It involves a canary deployment, where the original crystal service served through a load balancer consists the old application, and the crystal service using service discovery constitutes the new application. The old application must be replaced by the new application with minimal downtime, a condition that can be met by using a virtual router and a route that slowly shifts traffic from the old application to the new application by modifying the target weights in increments. In this scenario, the virtual service in front of the virtual router is given the shared name of the application - crystal.appmeshworkshop.hosted.local - which makes sense in this context, given that the application remains the same. We could just as well have named our router's fronting service green.colors.local, as long as the Cloud map service is also called green.

In fact, even though the virtual service in front of the virtual router requires a DNS-resolvable FDQN, the actual targets returned by that FDQN do not actually need to be served by the virtual router - the DNS service just has to exist! We'll explore this further in our testing.

Testing the Virtual Router

Before we can test our router, we need to add our router.colors.local virtual service as a backend to one of the nodes so we can curl this service. Let's add it to the red virtual node.






































// 03h-update-red.json
{
    "virtualNodeName": "red",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ],
        "serviceDiscovery": {
            "awsCloudMap": {
                "namespaceName": "colors.local",
                "serviceName": "red"
            }
        },
        "logging": {
            "accessLog": {
                "file": {
                    "format": {
                        "text": "[%START_TIME%] %REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL% %RESPONSE_CODE%\n"
                    },
                    "path": "/dev/stdout"
                }
            }
        },
        "backends": [
            {
                "virtualService": {
                    "virtualServiceName": "router.colors.local"
                }
            }
        ]
    }
}

aws appmesh update-virtual-node --cli-input-json file://03h-update-red.json

The virtual node's specification here is exactly the same as before, with the exception of the backends section we've just added, in which we've specified the router.colors.local virtual service.

Let's now ECS Exec into the red virtual node's ECS task to run some commands.

aws ecs execute-command --cluster my-cluster-1 \
    --task <red-task-id> \
    --container red-app \
    --interactive \
    --command "/bin/sh"

nslookup router.colors.local - This should return the private IP address of the green virtual node's ECS task, because we registered it to the router Cloud Map service.
while true; do curl router.colors.local && echo; done - This will repeatedly hit the router.colors.local DNS name. Notice the distribution between green and yellow at the expected 4:1 ratio as per our route configuration, despite the previous nslookup command only returning one value - this is because Envoy is intercepting the request and routing it as per our virtual router configuration, instead of simply allowing the request to be fulfilled by the resource returned by Cloud Map.

Leave the command loop running and navigate to the virtual router we just configured in the App Mesh service in the AWS web console. Navigate to the "Routes" tab, select the color-route, and click on the Edit button in the top-right. Here, we'll be able to modify the route's Target Configuration. Let's experiment with some of the following options while observing the output of the command loop.

Set the green virtual node's relative weight to 0. Notice the percentage change that will result in 100% of the traffic going to the yellow node.
Replace the green virtual node with the blue virtual node and add a positive, non-zero weight value to this target. Now, the target registered behind the router Cloud Map service (the green virtual node's ECS task) isn't being served by this router. Observe the lack of green in the while loop's output.

(Remember to click the Save button at the bottom of the Edit menu after each change!)

Experiment with different settings for the route's weighted targets and try to predict the traffic distribution each setting will result in. Once you're done, return the router's traffic distribution to the 80/20 split of green-to-yellow that it was at before.

4. The Virtual Gateway

Up until this point, we've only called the meshed virtual services from other virtual nodes. Now, we're going to introduce the fine key App Mesh component - the virtual gateway - to expose our services outside of the mesh.

Declaring the Architecture

As with all virtual resources in App Mesh, we must first begin by declaring the virtual gateway.




















// 04a-color-gateway.json
{
    "virtualGatewayName": "color-gateway",
    "meshName": "color-mesh",
    "spec": {
        "listeners": [
            {
                "portMapping": {
                    "port": 80,
                    "protocol": "http"
                }
            }
        ],
        "logging": {
            "accessLog": {
                "file": { "path": "/dev/stdout" }
            }
        }
    }
}

aws appmesh create-virtual-gateway --cli-input-json file://04a-color-gateway.json

Alone, the virtual gateway doesn't have much of a function - it has a listener, but once traffic reaches that listener, the gateway currently doesn't know where to send it. We need to attach a gateway route that tells the virtual gateway where the traffic needs to be sent. Gateway routes have a lot of functionality for different route protocols, and they can match specific parameters in the request - in this tutorial, we'll demonstrate prefix matching on the request path.




















// 04b-color-gateway-route-1.json
{
    "gatewayRouteName": "color-gateway-route-1",
    "virtualGatewayName": "color-gateway",
    "meshName": "color-mesh",
    "spec": {
        "httpRoute": {
            "action": {
                "target": {
                    "virtualService": {
                        "virtualServiceName": "router.colors.local"
                    }
                }
            },
            "match": {
                "prefix": "/router"
            }
        }
    }
}

aws appmesh create-gateway-route --cli-input-json file://04b-color-gateway-route-1.json

In the manifest above:

We attach a gateway route to our color-gateway which looks for requests hitting the /router prefix on the virtual gateway.
The inbound /router request must be made using the HTTP/1 protocol.
When a request matches this prefix, the route will send it to the router.colors.local service. Recall from the previous section that this service was implemented using a virtual router, with 2 backends - green and yellow.

Let's also add a root path / route that will direct traffic to another service - blue.colors.local.






















// 04b-color-gateway-route-2.json
{
    "gatewayRouteName": "color-gateway-route-2",
    "virtualGatewayName": "color-gateway",
    "meshName": "color-mesh",
    "spec": {
        "httpRoute": {
            "action": {
                "target": {
                    "virtualService": {
                        "virtualServiceName": "blue.colors.local"
                    }
                }
            },
            "match": {
                "method": "GET",
                "port": 80,
                "prefix": "/"
            }
        }
    }
}

aws appmesh create-gateway-route --cli-input-json file://04b-color-gateway-route-2.json

In this scenario, we have a root path / and a /router path. When a valid request hits the virtual gateway, it will be matched against the longer path, meaning that a request for /router will not be matched against the root path because / is shorter than /router. This prevents double-ups in responses to requests.

Implementing the Architecture

Now that we've declared our virtual gateway in App Mesh, let's implement it in ECS. While our virtual nodes were previously implemented by spinning up an Envoy proxy as a sidecar to the main application container in our tasks, a virtual gateway is implemented by configuring Envoy to run alone as the main container in a task.

Incoming traffic will reach Envoy directly, which will be listening on the same port we configured in 04a-color-gateway.json, and Envoy will implement the upstream virtual gateway using the same environment variable as previously - APPMESH_RESOURCE_ARN. This means that we'll be opening ports on the Envoy container directly, instead of on another container in the task, because Envoy is the main container for the virtual gateway.

We'll start with the task definition.



















































// 04c-gateway-td.json
{
    "family": "gateway",
    "taskRoleArn": "<ecs-task-role-arn>",
    "executionRoleArn": "<ecs-task-execution-role-arn>",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {         
            "name" : "envoy-proxy",
            "image" : "840364872350.dkr.ecr.ap-southeast-2.amazonaws.com/aws-appmesh-envoy:v1.27.0.0-prod",
            "cpu": 256,
            "memory": 512,
            "portMappings": [
                {
                    "containerPort": 80,
                    "protocol": "tcp",
                    "name": "gateway-port"
                }
            ],
            "essential" : true,
            "environment" : [
                {
                    "name" : "APPMESH_RESOURCE_ARN",
                    "value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualGateway/color-gateway"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "color-mesh",
                    "awslogs-region": "<your-region>",
                    "awslogs-create-group": "true",
                    "awslogs-stream-prefix": "04e-gateway"
                }
            },
            "startTimeout": 30,
            "healthCheck" : {
                "command" : [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ],
                "interval" : 5,
                "retries" : 3,
                "startPeriod" : 10,
                "timeout" : 2
            }
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "256",
    "memory": "512"
}

aws ecs register-task-definition --cli-input-json file://04c-gateway-td.json

Notice the updated value of the APPMESH_RESOURCE_ARN environment variable, to which we're now passing the ARN of the virtual gateway.
Unlike in previous task definitions for virtual nodes, the Envoy container in the virtual gateway doesn't have to share its resource budget with any other container, and hence we can allocate it the lion's share of the task's resource requirements.
Notice the lack of a proxyConfiguration field - there isn't a separate application container for Envoy to proxy traffic through this time - rather, Envoy is the main application container to which traffic will be sent. We omit the proxy configuration to prevent Envoy from proxying its own traffic in a loop.

Before we can bring up a task that will implement our virtual gateway, we need to configure a security group that will allow traffic on port 80 to reach the task. This is the first instance of a virtual mesh resource that is intended to be accessible from outside the mesh and outside the VPC, and so we'll need to configure a security group that allows us to reach the task. We'll call it gateway-sg.

Now, let's bring up an instance of this implementation.


















// 04d-gateway-service-public-subnets.json
{
    "cluster": "<cluster-name>",
    "serviceName": "gateway-svc",
    "taskDefinition": "gateway",
    "desiredCount": 1,
    "launchType": "FARGATE",
    "platformVersion": "LATEST",
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [ <list-of-public-subnets> ],
            "securityGroups": [ <colorapp-sg-id>, <gateway-sg-id> ],
            "assignPublicIp": "ENABLED"
        }
    },
    "schedulingStrategy": "REPLICA",
    "deploymentController": { "type": "ECS" }
}

aws ecs create-service --cli-input-json file://04d-gateway-service-public-subnets.json

Because we want the gateway to be publicly reachable, we must launch it into the public subnets of the same VPC in which the other 4 nodes in our service mesh reside. At present, this is the only public task in our entire mesh.
The colorapp security group is the other security group we attach to tasks in this service, and is the same security group we previously attached to the other 4 tasks that implement the virtual nodes running in our color mesh. This security group will give the virtual gateway implementation the correct permission to route incoming traffic to the relevant backend(s) as per the gateway routes we previously configured, while the gateway-sg will provide the correct permission for us to reach the gateway task.

Testing the Implementation

Because we enabled public IP assignment in our gateway service, we can access the task that's been spun up by this service via its public IP address. From your local development machine this time (and assuming you have public internet access - how else would you be reading this guide?), let's run the following command to test the virtual gateway:

curl <task-public-ip>

We're hitting the gateway on one of the 2 routes that it's been configured to listen for incoming traffic on, and the root path for our task should return blue. Let's try the other route.

while true; do curl <task-public-ip>/router && echo; sleep 1; done

The echo command is just to space the loop's output.

When we query this route, we should get the same green / yellow output at a roughly 4:1 ratio as when we queried the router.colors.local service from within our mesh (assuming you haven't modified the distribution, or that you've reset it back to default values - still, expect to see the same output as when you queried the virtual router from within the mesh).

// Example output
green
green
green
green
yellow
green
yellow
...

Like other aspects of the mesh, we can modify the gateway route in-flight while the above loop is running. Navigate to the color-gateway virtual gateway in the App Mesh service, select the Gateway routes tab, select the singular color-gateway-route-1 gateway route and click on Edit. Experiment with the virtual service that this gateway route is configured to send traffic to, as well as the prefix which the route is listening for traffic on. Once you're ready, move onto the next section.

Load Balancing the Virtual Gateway

Awesome - we now have an ECS task that implements our color mesh's virtual gateway with 2 routes to each of the blue and router virtual services. However, what happens if our gateway service becomes really popular? Well, the first thing we'd need to do is to scale up the service, introducing more tasks to account for the higher traffic load - but then, we'd have multiple access points - one public IP for every task implementing the virtual gateway. We need to define a single access point for all of these tasks, and improve our mesh's security by only allowing gateway access through the new access point we create. We can achieve all of these design requirements by using Elastic Load Balancing in conjunction with ECS.

Note: AWS recommends using an NLB to load balance tasks implementing a virtual gateway with App Mesh.

Let's bring down the 1-task service implementation of our virtual gateway and replace it with a Network Load Balancer (NLB) solution. We need to create the NLB and its associated target group before we can use these resources with an ECS service. We'll start by creating the NLB.









// 04e-load-balancer-json
{
    "Name": "gateway-lb",
    "Subnets": [ <your-vpc-public-subnets> ],
    "SecurityGroups": [ <colorapp-sg>, <gateway-sg> ],
    "Scheme": "internet-facing",
    "Type": "network",
    "IpAddressType": "ipv4"
}

aws elbv2 create-load-balancer --cli-input-json file://04e-load-balancer.json

What have we just created? Let's review the load balancer specification.

We are creating a network load balancer (NLB).
Load balancers on AWS use Elastic Network Interfaces as ELB nodes, which are used to distribute traffic to targets. The Subnets entry allows us to tell the NLB which subnets in our mesh's VPC these nodes should be launched into. For a publicly-accessible load balancer, we will launch these nodes into the public subnets of our VPC.
We move our the gateway-sg security group to the load balancer, since this will be our new point of access into the mesh, and we'll also attach the colorapp-sg to give the load balancer nodes permission to route traffic to the virtual gateway tasks in ECS. When we get to launching those tasks, we'll remove gateway-sg from them, as we won't be accessing those tasks directly anymore.
Because we want this NLB to be internet-accessible, the Scheme must be internet-facing.

Load balancers need target groups, where we'll be able to register the targets that we want traffic routed to. Let's create a target group for our load balancer.














// 04e-target-group.json
{
    "Name": "gateway-tg",
    "Protocol": "TCP",
    "Port": 80,
    "VpcId": "<your-mesh-vpc-id>",
    "HealthCheckEnabled": true,
    "HealthCheckIntervalSeconds": 10,
    "HealthCheckTimeoutSeconds": 5,
    "HealthyThresholdCount": 2,
    "UnhealthyThresholdCount": 3,
    "TargetType": "ip",
    "IpAddressType": "ipv4"
}

aws elbv2 create-target-group --cli-input-json file://04e-target-group.json

We have 2 options for registering targets to a target group - by instance ID, or IP address. Because our tasks run in Fargate (and they don't run on EC2 instances), we must register the targets by ip.
We've defined a health check for our target group, but we haven't specified a port for the health check. The default behavior here is for the health check to be run on the same port on which the application will receive traffic, which is "Port": 80.

At this point in time, the NLB and target group are separate entities; there is no relationship between these components. We need to create a listener on the NLB to bind the target group - here's how we do that.












// 04f-nlb-listener.json
{
    "LoadBalancerArn": "<your-nlb-arn>",
    "Protocol": "TCP",
    "Port": 80,
    "DefaultActions": [
        {
            "Type": "forward",
            "TargetGroupArn": "<your-target-group-arn>"
        }
    ]
}

aws elbv2 create-listener --cli-input-json file://04f-nlb-listener.json

Remember to add your NLB and target group ARNs from the commands we ran previously. We are creating a listener that will accept TCP traffic on port 80 - the same port on which our virtual gateway has a route. Let's now create our gateway service in ECS again, but this time, we'll specify the target group in the service definition, allowing ECS to register the task IP addresses as targets.

























// 04g-gateway-service-load-balanced.json
{
    "cluster": "<cluster-name>",
    "serviceName": "gateway-svc-load-balanced",
    "taskDefinition": "gateway",
    "loadBalancers": [
        {
            "targetGroupArn": "<your-target-group-arn>",
            "containerName": "envoy-proxy",
            "containerPort": 80
        }
    ],
    "desiredCount": 2,
    "launchType": "FARGATE",
    "platformVersion": "LATEST",
    "networkConfiguration": {
        "awsvpcConfiguration": {
            "subnets": [ <list-of-private-subnets> ],
            "securityGroups": [ "<colorapp-sg>" ],
            "assignPublicIp": "DISABLED"
        }
    },
    "schedulingStrategy": "REPLICA",
    "deploymentController": { "type": "ECS" }
}

aws ecs create-service --cli-input-json file://04g-gateway-service-load-balanced.json

We've disabled the tasks' ability to receive a public IP address and moved it into the private subnets of our VPC to prevent out-of-band access to the tasks, forcing all traffic originating from outside of our VPC to go through the NLB created in the public subnets.

Once the ECS tasks are running and their private IP addresses can be seen in the gateway-tg target group (indicating ECS is aware of the upstream target group and has successfully performed IP target registration), you can go ahead and either curl or access the DNS name of your load balancer in a browser and hit the same routes as before - / and /router - on the load balancer. You should receive the same responses as you did previously, when you queried the virtual gateway task IP address directly - except this time, you're accessing those tasks through an NLB.

Congratulations! You've just configured a simple service mesh using App Mesh on Amazon ECS. Let's recap the architecture so far.

We started by setting up the Cloud Map service discovery namespace and the parent service mesh.
We introduced the first 2 virtual nodes, red and blue, and demonstrated asymmetrical communication between virtual nodes.
We added another 2 virtual nodes, green and yellow, and put them behind a virtual router. We also discussed DNS requirement for the virtual router.
We introduced a virtual gateway to allow our meshed applications to become accessible from traffic sources outside of the mesh, using a carefully controlled access plane, and incorporated Elastic Load Balancing to distribute traffic between multiple instances of the virtual gateway.

This concludes the core components of App Mesh, and you should now have the knowledge (and hopefully the confidence!) to start building your own services meshes on Amazon ECS. If you're just looking for the basics of App Mesh, you can stop here. However, if you'd like to learn more about service mesh network security and observability solutions, read on! We're about to take a deep dive into some of the more complex topics.

This guide is continued in Part 2.

Written by ForsakenIdol.

App Mesh on Amazon ECS - Part 1

Prerequisites

1. Setting up the Basic Infrastructure.

2. App Mesh: The First 2 Virtual Nodes

Declaring the Architecture

Implementing the Architecture

Launching the Implementation

Testing the Implementation

Aside: Envoy Metrics

3. The Virtual Router

Green and Yellow

Configuring the Virtual Router

Explaining FDQNs and Virtual Routers

Testing the Virtual Router

4. The Virtual Gateway

Declaring the Architecture

Implementing the Architecture

Testing the Implementation

Load Balancing the Virtual Gateway

Read more

ECS Task Groups - Changing the Design Philosophy

[Part 2] App Mesh on Amazon ECS

Observability with OpenTelemetry - Amazon ECS