# App Mesh on Amazon ECS - Part 1
*Part 2 of this guide can be found [**here**](https://hackmd.io/@ForsakenIdol/app-mesh-on-amazon-ecs-part-2).*
[**AWS App Mesh**](https://aws.amazon.com/app-mesh/) is a [service mesh](https://aws.amazon.com/what-is/service-mesh/) solution that provides a number of benefits for microservices.
- Standardized communication protocols invisible to applications within container groupings.
- Decoupled observability solutions from application-level implementation.
- High visibility into service-to-service communication for easier debugging and troubleshooting.
Despite all of these benefits, App Mesh can prove to be rather finicky to get started with, and a fair portion of the existing learning materials for [ECS](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-ecs.html) and [EKS](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-kubernetes.html) tend to gloss over a number of important conceptual details and either leave the base application implementation up to the user, or jump straight to a fully-fledged architecture, making for a somewhat unstable introduction to the world of service meshes. This tends to complicate further App Mesh topics and has the potential to discourage users from feeling motivated to further explore these complex concepts, such as [**TLS configuration**](https://docs.aws.amazon.com/app-mesh/latest/userguide/tls.html) and [**observability routing**](https://docs.aws.amazon.com/app-mesh/latest/userguide/observability.html), which can further improve application security and visibility respectively.
In this tutorial, I'll aim to provide an **in-depth introduction** to App Mesh on **Amazon ECS** and explain the core concepts of this service from the ground up. We'll start with a simple mesh and cover several topics in the following order:
1. Configuring the **initial infrastructure**.
2. Configuring the simplest mesh - just 2 **virtual nodes**.
3. Incorporating a **virtual router** fronting 2 more virtual nodes.
4. Incorporating a **virtual gateway** to allow external users to access specific microservices within the mesh.
5. Securing service mesh communication by incorporating **TLS authentication**.
6. Incorporating the [**AWS Distro for OpenTelemetry**](https://aws.amazon.com/otel/) to route **observability signals** to other AWS services.
**Part 1** of the guide (the part you're currently reading) covers topics #1 to #4, which provide an in-depth introduction to core App Mesh concepts. [**Part 2**](https://hackmd.io/@ForsakenIdol/app-mesh-on-amazon-ecs-part-2) contains topics #5 onwards and covers more advanced concepts.
Throughout this tutorial, we'll be making use of the [**`forsakenidol/colorapp`**](https://hub.docker.com/r/forsakenidol/colorapp/) DockerHub image as our microservice of choice for demonstration purposes. This image was inspired by a build of a very simple web application inspired by [`subfuzion/colorteller`](https://hub.docker.com/r/subfuzion/colorteller), but also incorporates a shell with `curl` and `nslookup` for testing purposes (which `subfuzion/colorteller` lacks). The `colorapp` image has the following properties:
- It runs on port 80 by default, but you can map this port to any value in ECS. We'll be using the default port 80 in this tutorial.
- It accepts a `COLOR` environment variable to which you can pass a string.
- It only has 2 paths - a root path `/` which simply returns the value of `COLOR`, and a `/healthcheck` path which returns a 200 status code and the word `hi`.
Feel free to pull the image onto your local machine to experiment with it.
### Prerequisites
You'll need the following knowledge to fully understand this tutorial:
- An understanding of DNS in the context of AWS Cloud Map.
- A basic understanding of what a service mesh is - we'll discuss service mesh components in more depth in this tutorial, but you should at least be familiar with the concept of a service mesh.
Should you wish to follow along with this tutorial, you'll need the following resources.
- A VPC with NAT-translated private subnets, and an ECS cluster configured in your AWS account. You don't need any container instances, since our architecture will be fully [**Fargate-compatible**](https://docs.aws.amazon.com/AmazonECS/latest/userguide/what-is-fargate.html).
- Make sure your VPC has the `Enable DNS hostnames` and `Enable DNS support` settings enabled (checkbox ticked if you're in the AWS console). This is a requirement for using Cloud Map, which we will be making extensive use of in this tutorial.
- The AWS CLI v2 installed on your development machine, should you wish to follow along with this tutorial.
Throughout this tutorial, we'll incorporate the AWS CLI as much as possible to adopt an **infrastructure-as-code** approach to architectural configuration. We'll be storing a lot of our resource configs in JSON files that you'll be able to push to an upstream repository if you so wish.
Let's get started.
## 1. Setting up the Basic Infrastructure.
Before we can configure virtual nodes, we must first set up the key components of our infrastructure required to support those virtual nodes. The components we'll be configuring in this section are the DNS namespace for our services, and the parent service mesh in App Mesh.

We'll be using [**Cloud Map**](https://aws.amazon.com/cloud-map/) to manage DNS records within our mesh. This is the upstream service ECS prefers when configuring service discovery. Let's start by creating a Cloud Map [namespace](https://docs.aws.amazon.com/cloud-map/latest/dg/working-with-namespaces.html) - create the first file `01a-namespace.json` and pass it to the following AWS CLI command. Replace `<vpc-id>` with the ID of your pre-existing VPC. Because we're creating a private namespace that will **only** be resolvable within our VPC, Cloud Map service records for this namespace will be tied to that VPC, and we'll launch all of our resources into that VPC as well.
```json=
// 01a-namespace.json
{
"Name": "colors.local",
"Description": "Namespace for the color mesh.",
"Vpc": "<vpc-id>"
}
```
(***Note**: The following command returns an `OperationId` that you can use to track the status of the namespace's creation operation, since it will complete asynchronously to the command. To track the status of the creation process, pass the OperationId to* `aws servicediscovery get-operation --operation-id`.)
```shell
aws servicediscovery create-private-dns-namespace --cli-input-json file://01a-namespace.json
```
<div style="background-color: #88f7b3; padding: 2rem 2rem; margin-bottom: 1.5rem;">
<b>Note</b>: We'll be creating quite a fair number of JSON manifests for our infrastructure in this tutorial. Some manifests have a dependency on specific resources being created, or there may be a logical order in which resources should be created. A good way to remember the order in which those manifests are created is to prefix the manifest name with its applied order, which we'll be doing throughout this tutorial.
</div>
When we proceed to register services within this namespace, those services will be resolvable within the VPC under the `colors.local` suffix. For example, if we register a service called `red`, the FDQN for that service will be `red.colors.local`, etc. for any other service we register into this namespace. The default TTL of SOA records in Cloud Map is 15 seconds, which is sufficient for our purposes - we won't be managing SOA records in this tutorial.
Let's go ahead and register 2 Cloud Map services for use in the next section - `red` and `blue`. You can fetch your `<cloud-map-namespace-id>` from the namespace view in the Cloud Map service.
```json=
// 01b-red.json
{
"Name": "red",
"NamespaceId": "<cloud-map-namespace-id>",
"Description": "Red color service.",
"DnsConfig": {
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
}
}
```
```json=
// 01b-blue.json
{
"Name": "blue",
"NamespaceId": "<cloud-map-namespace-id>",
"Description": "Blue color service.",
"DnsConfig": {
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
}
}
```
<div style="background-color: #88f7b3; padding: 2rem 2rem; margin-bottom: 1.5rem;">
<b>Note</b>: If you're wondering how these JSON manifests should be structured, you can append <code>--generate-cli-skeleton</code> to any AWS CLI module's command to create a skeleton of the file you need to pass to <code>--cli-input-json</code>.
</div>
Then, apply the 2 files to our namespace. Whenever I use the same letter prefix for files, you can apply files with the same prefix in any order.
```shell=
aws servicediscovery create-service --cli-input-json file://01b-red.json
aws servicediscovery create-service --cli-input-json file://01b-blue.json
```
At this point in time, make note of the service's `Id` and `Arn` fields in the output of the `create-service` command - we'll be using them in a later section. Later on in this tutorial, `red` and `blue` will be served directly by corresponding virtual services in App Mesh. We now have 2 Cloud Map services at our disposal, and their FDQNs are provided below:
```
red.colors.local
blue.colors.local
```
This is all we'll be doing at this point with Cloud Map in our infrastructure configuration, so let's now move over to the **App Mesh service console** and create a service mesh, which we'll begin to populate in section #2.
```json=
// 01c-mesh.json
{
"meshName": "color-mesh",
"spec": {
"egressFilter": {
"type": "DROP_ALL"
}
}
}
```
```shell
aws appmesh create-mesh --cli-input-json file://01c-mesh.json
```
An Egress filter of `DROP_ALL` means that any traffic that...
- Is visible to Envoy,
- Originated from a virtual node in our `color-mesh` service mesh, and
- Isn't destined for a service in our mesh,
... will not be allowed to leave the mesh.
Congratulations! You've set up the DNS records for our services and created the service mesh in App Mesh. Let's move onto section #2 and start populating this mesh with virtual components and implementing those components in ECS.
## 2. App Mesh: The First 2 Virtual Nodes
App Mesh virtual nodes involve an application, which can consist of one or more containers, and an [**Envoy proxy sidecar**](https://www.envoyproxy.io/), which is used to intercept, route, and / or drop traffic emitted by the application. A single virtual node can have one or more **backends**, which are other applications in the mesh that this virtual node is allowed to communicate with. Communication can be **asymmetrical**, i.e. application A may have application B configured as a backend, allowing A to initiate communication with B, but B may not necessarily have A as a backend, preventing B from initiating communiation with A. If this sounds confusing, don't worry - we're going to explore this configuration further in this section!

There's 2 parts to a virtual node in App Mesh:
1. The node's **specification** in App Mesh, including configuration parameters such as open ports, access logging, service discovery, and allowed backends.
2. The node's **implementation** in an upstream service such as ECS (the focus of this tutorial), EKS, or EC2.
The implementation is done by the Envoy proxy sidecar, which, when given the [correct permissions](https://docs.aws.amazon.com/app-mesh/latest/userguide/proxy-authorization.html) and [configuration variable(s)](https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy-config.html), can read the App Mesh specification of the node it has to implement, and configure itself accordingly to match that specification. We declare the specification in the App Mesh service, then tell the Envoy proxy to read and implement that specification.
<div style="background-color: #88f7b3; padding: 2rem 2rem; margin-bottom: 1.5rem;">
<b>Note</b>: Are you familiar with <a href="https://kubernetes.io/"><b>Kubernetes</b></a>? This is not too dissimilar from the <a href="https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/"><b>declarative configuration</b></a> that Kubernetes uses, where we specify the desired state of an object, and Kubernetes uses its upstream implementation to create and configure resources that match our desired specification. The only difference with App Mesh is that we need to explicitly create the Envoy proxy that will be used to implement our virtual resources, before Envoy handles its own auto-configuration. App Mesh will not <i>automatically</i> create the proxy for us, but once created and give an App Mesh resource, it <b>will</b> automatically configure itself <b>as</b> that resource.
</div>
### Declaring the Architecture
The simplest architecture for a service mesh involves just 2 virtual nodes. Let's start by configuring those using our `red` and `blue` Cloud Map services we created in section #1. We'll start with the `red` virtual node.
```json=
// 02a-red-node.json
{
"virtualNodeName": "red",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
],
"serviceDiscovery": {
"awsCloudMap": {
"namespaceName": "colors.local",
"serviceName": "red"
}
},
"logging": {
"accessLog": {
"file": {
"format": {
"text": "[%START_TIME%] %REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL% %RESPONSE_CODE%\n"
},
"path": "/dev/stdout"
}
}
}
}
}
```
```shell
aws appmesh create-virtual-node --cli-input-json file://02a-red-node.json
```
What did we just create?
- Our `red` virtual node will have a single open port - 80 - which corresponds to the aforementioned `colorapp` port when this node is implemented.
- The virtual node will be reachable by the `red.colors.local` FDQN. This corresponds to the `red` service in the `colors.local` Cloud Map namespace - remember this for later!
- The Envoy proxy implementing this virtual node will log its access patterns in a **specific text format** to the `stdout` stream. This format is pieced together using [**supported format strings**](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#configuration) in the Envoy documentation.
Let's front that virtual node with a virtual service.
```json=
// 02b-red-service.json
{
"virtualServiceName": "red.colors.local",
"meshName": "color-mesh",
"spec": {
"provider": {
"virtualNode": {
"virtualNodeName": "red"
}
}
}
}
```
```shell
aws appmesh create-virtual-service --cli-input-json file://02b-red-service.json
```
Virtual nodes are not served directly - they either sit directly behind a virtual service (as in this scenario), or behind a **virtual router**, which in turn sits behind a virtual service (which we'll explore later in this tutorial). A virtual service cannot serve traffic unless a **provider** is specified - in this case, the provider is the `red` virtual node we created previously.
That's our `red` node - now, let's create a `blue` node.
```json=
// 02c-blue-node.json
{
"virtualNodeName": "blue",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
],
"serviceDiscovery": {
"awsCloudMap": {
"namespaceName": "colors.local",
"serviceName": "blue"
}
},
"logging": {
"accessLog": {
"file": {
"format": {
"json": [
{
"key": "start_time",
"value": "%START_TIME%"
},
{
"key": "method",
"value": "%REQ(:METHOD)%"
},
{
"key": "request_path",
"value": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
},
{
"key": "protocol",
"value": "%PROTOCOL%"
},
{
"key": "response_code",
"value": "%RESPONSE_CODE%"
}
]
},
"path": "/dev/stdout"
}
}
},
"backends": [
{
"virtualService": {
"virtualServiceName": "red.colors.local"
}
}
]
}
}
```
```shell
aws appmesh create-virtual-node --cli-input-json file://02c-blue-node.json
```
This looks rather different to our `red-node.json` file - what's changed here?
- Our `blue` virtual node has the `red.colors.local` virtual service listed as a backend. This means that `blue` will be able to contact `red` through the `red.colors.local` virtual service, but **not** vice versa, since we didn't configure `red`'s backends to include the `blue` application.
- Envoy will log in a different format once it implements the `blue` node. Here, we're telling it to log in the `json` format, albeit with the same 5 format strings as the `red` virtual node's text logging configuration.
Let's also configure a virtual service for our `blue` application.
```json=
// 02d-blue-service.json
{
"virtualServiceName": "blue.colors.local",
"meshName": "color-mesh",
"spec": {
"provider": {
"virtualNode": {
"virtualNodeName": "blue"
}
}
}
}
```
```shell
aws appmesh create-virtual-service --cli-input-json file://02d-blue-service.json
```
Now, return to the top-level of our working directory and let's briefly review the infrastructure we've just configured.
- We have 2 virtual nodes - a `red` node and a `blue` node.

- Each virtual node is served by a virtual service - `red.colors.local` and `blue.colors.local`. We must give a virtual service the same name that meshed applications will use to query it - here, that's the FDQN as per the Cloud Map service.

- `blue` will be able to contact `red` through `red.colors.local` because the relevant service has been configured as a backend. However, the reverse is not valid - `red` cannot query `blue.colors.local` because `red` has no virtual service backends.
### Implementing the Architecture
Now, let's implement these 2 virtual nodes in ECS. Before we can do this, however, there's a few things we need to configure.
We'll start by creating a [task role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html) and a [task execution role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html) for our ECS tasks.
- The task role requires 2 sets of permissions - the AWS-managed `AWSAppMeshEnvoyAccess` IAM policy to provide `appmesh:StreamAggregatedResources` permissions for our Envoy proxy to fetch virtual node information from App Mesh, and another policy that will allow us to execute [**ECS Exec**](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html) commands on our tasks for testing purposes ([these](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html#ecs-exec-required-iam-permissions) are the required permissions for ECS Exec). Notably, the AWS-managed policy for Envoy provides permissions for a single Envoy container to fetch **any** virtual resource information from App Mesh. In a small mesh of our size with only 4 virtual nodes (eventually), this isn't a big problem, but for bigger production meshes, you may want to consider duplicating this policy and scoping it down to a single policy for each virtual resource.

- The task execution role requires the standard `AmazonECSTaskExecutionRolePolicy` AWS-managed IAM policy, but also an additional inline policy that provides `logs:CreateLogGroup` permissions if a CloudWatch log group has not yet been created for our task ([this permission is not provided in `AmazonECSTaskExecutionRolePolicy`](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonECSTaskExecutionRolePolicy.html) as of the time of writing). The latter permission will allow ECS to create the log group for this task if it does not already exist.

We'll also configure a few more miscellaneous resources.
- Create a **security group** for our task. We'll need to specify one for our tasks since we're using the `awsvpc` network mode in Fargate - ECS will create the Fargate task ENIs in our VPC, and we need to attach security group(s) to these ENIs to interact with them. For our purposes, we can create a [**self-referencing security group**](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-rules-reference.html#sg-rules-other-instances), which allows any traffic. as long as the traffic originates from a resource which also has the same security group attached. This works for our means because we'll only be attaching these security groups to our meshed applications. Let's call this security group `colorapp-sg`.

- Get the ARNs of the `red` and `blue` Cloud Map services in our `colors.local` namespace - we'll need to specify these in a few documents later down the line. The service ID should be of the form `srv-xxxxxxxxxx`, and you can find this in the Cloud Map console - you'll need to run `aws servicediscovery get-service --id srv-xxxxxxxxxx` to get the service ARN in the output of this command. Run this twice for each of the `red` and `blue` service IDs in Cloud Map.
```sh=
$ aws servicediscovery get-service --id <color-service-id>
{
"Service": {
"Id": "<color-service-id>",
"Arn": "arn:aws:servicediscovery:eu-west-1:<account-id>:service/<color-service-id>",
"Name": "red",
"NamespaceId": "<colors-namespace>",
"Description": "<color> color service.",
"DnsConfig": {
"NamespaceId": "<colors-namespace>",
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
},
"Type": "DNS_HTTP",
"CreateDate": "<date-service-was-created>",
"CreatorRequestId": "xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
}
```
Now, we can configure our task definition for the meshed applications. Let's start with the task definition for the `red` virtual node implementation. I'll show you what the JSON file should look like, then we'll break down some of the notable components in the file.
```json=
// 02e-red-td.json
{
"family": "red_node",
"taskRoleArn": "<task-role-arn>",
"executionRoleArn": "<task-execution-role-arn>",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "red-app",
"image": "forsakenidol/colorapp",
"cpu": 128,
"memory": 128,
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp",
"name": "color-port"
}
],
"essential": true,
"environment": [
{
"name": "COLOR",
"value": "red"
}
],
"dependsOn": [
{
"containerName": "envoy-sidecar",
"condition": "HEALTHY"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "color-mesh",
"awslogs-region": "<your-region>",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "red-app-container"
}
},
"healthCheck": {
"command": [ "CMD-SHELL", "curl -f http://localhost/health/ || exit 1" ],
"interval": 5,
"timeout": 5,
"retries": 3
}
},
{
"name" : "envoy-sidecar",
"image" : "840364872350.dkr.ecr.ap-southeast-2.amazonaws.com/aws-appmesh-envoy:v1.27.0.0-prod",
"cpu": 128,
"memory": 384,
"essential" : true,
"environment" : [
{
"name" : "APPMESH_RESOURCE_ARN",
"value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualNode/red"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "color-mesh",
"awslogs-region": "<your-region>",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "red-envoy-sidecar"
}
},
"startTimeout": 30,
"healthCheck" : {
"command" : [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ],
"interval" : 5,
"retries" : 3,
"startPeriod" : 10,
"timeout" : 2
},
"user" : "1337"
}
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "256",
"memory": "512",
"proxyConfiguration": {
"type": "APPMESH",
"containerName": "envoy-sidecar",
"properties": [
{
"name": "IgnoredUID",
"value": "1337"
},
{
"name": "AppPorts",
"value": "80"
},
{
"name": "ProxyIngressPort",
"value": "15000"
},
{
"name": "ProxyEgressPort",
"value": "15001"
},
{
"name": "EgressIgnoredIPs",
"value": "169.254.170.2,169.254.169.254"
}
]
}
}
```
- Make sure to add your task role ARN, task execution role ARN, region, and account ID where the placeholders are for each of these values.
- To ensure that traffic to and from our application container **must** pass through Envoy, we configure a startup ordering. Envoy must become healthy before our application container starts, and we configure this using the `dependsOn` task definition parameter.
- We follow the instructions in step #6 of [**this**](https://docs.aws.amazon.com/app-mesh/latest/userguide/getting-started-ecs.html#update-services) guide to configure the Envoy container. Bear in mind that the version of the Envoy image may be different at the time of writing to when you're using this image.
- Note the `"user": "1337"` specification, which allows us to specify the user ID of the main process within the Envoy sidecar. This allows us to explicitly choose the user ID, which we can then specify that App Mesh should ignore via the `IgnoredUID` property of the `proxyConfiguration` to prevent App Mesh from trying to **proxy its own traffic**.
- The `APPMESH_RESOURCE_ARN` environment variable tells the Envoy sidecar which App Mesh resource it needs to implement. Here, it's trying to implement the `red` virtual node.
You may also notice that despite application traffic being proxied through Envoy, that we're actually opening the task port on the `red-app` container, instead of on the `envoy-sidecar` container. This is because the Envoy sidecar, despite being labelled as a **proxy**, actually behaves slightly differently than traditional network proxies you may already be familiar with. This difference manifests in the way the proxy handles traffic going in and out of the task that it intercepts. To put it simply:
***For incoming traffic to the task:***
1. Incoming traffic hits the port exposed on the application container.
2. Before the application container actually receives that traffic, IP table rules intercept the traffic and re-route it to Envoy.
3. Envoy processes the traffic, then sends it back to the application container.
***For traffic leaving the container:***
1. Outgoing traffic is sent by the application container.
2. Before traffic leaves the task, IP table rules intercept the traffic and send it to Envoy.
3. Envoy processes the traffic, and if it is taking a valid path through the mesh (or to an out-of-mesh endpoint if allowed), puts it back in the original location before it was intercepted to proceed out of the task.
4. If the traffic is not taking a valid path through the mesh (e.g. trying to reach a service not configured as a backend for this node), Envoy stops the traffic instead.
Let's also configure the task definition for the `blue` virtual node implementation.
```json=
// 02e-blue-td.json
{
"family": "blue_node",
"taskRoleArn": "<task-role-arn>",
"executionRoleArn": "<task-execution-role-arn>",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name": "blue-app",
"image": "forsakenidol/colorapp",
"cpu": 128,
"memory": 128,
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp",
"name": "color-port"
}
],
"essential": true,
"environment": [
{
"name": "COLOR",
"value": "blue"
}
],
"dependsOn": [
{
"containerName": "envoy-sidecar",
"condition": "HEALTHY"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "color-mesh",
"awslogs-region": "<your-region>",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "blue-app-container"
}
},
"healthCheck": {
"command": [ "CMD-SHELL", "curl -f http://localhost/health/ || exit 1" ],
"interval": 5,
"timeout": 5,
"retries": 3
}
},
{
"name" : "envoy-sidecar",
"image" : "840364872350.dkr.ecr.ap-southeast-2.amazonaws.com/aws-appmesh-envoy:v1.27.0.0-prod",
"cpu": 128,
"memory": 384,
"essential" : true,
"environment" : [
{
"name" : "APPMESH_RESOURCE_ARN",
"value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualNode/blue"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "color-mesh",
"awslogs-region": "<your-region>",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "blue-envoy-sidecar"
}
},
"startTimeout": 30,
"healthCheck" : {
"command" : [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ],
"interval" : 5,
"retries" : 3,
"startPeriod" : 10,
"timeout" : 2
},
"user" : "1337"
}
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "256",
"memory": "512",
"proxyConfiguration": {
"type": "APPMESH",
"containerName": "envoy-sidecar",
"properties": [
{
"name": "IgnoredUID",
"value": "1337"
},
{
"name": "AppPorts",
"value": "80"
},
{
"name": "ProxyIngressPort",
"value": "15000"
},
{
"name": "ProxyEgressPort",
"value": "15001"
},
{
"name": "EgressIgnoredIPs",
"value": "169.254.170.2,169.254.169.254"
}
]
}
}
```
Observe the near-identicality of both task definitions with the exception of the parameters which pertain to the color being served and the name of the main application container.
Let's register these task definitions.
```shell
aws ecs register-task-definition --cli-input-json file://02e-red-td.json
aws ecs register-task-definition --cli-input-json file://02e-blue-td.json
```
### Launching the Implementation
Now, we can create services from our task definitions to actually launch the virtual node implementations into our ECS cluster. Create the service JSON specifications for each of the `red` and `blue` services as follows.
```json=
// 02f-red-service.json
{
"cluster": "<ecs-cluster-name>",
"serviceName": "red-svc",
"taskDefinition": "red_node",
"serviceRegistries": [
{ "registryArn": "<your-red-service-cloud-map-arn>" }
],
"desiredCount": 1,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [ "<list-of-desired-private-subnets-in-string-form>" ],
"securityGroups": [ "<your-color-app-security-group>" ],
"assignPublicIp": "DISABLED"
}
},
"schedulingStrategy": "REPLICA",
"deploymentController": { "type": "ECS" },
"enableExecuteCommand": true
}
```
```json=
// 02f-blue-service.json
{
"cluster": "<ecs-cluster-name>",
"serviceName": "blue-svc",
"taskDefinition": "blue_node",
"serviceRegistries": [
{ "registryArn": "<your-blue-service-cloud-map-arn>" }
],
"desiredCount": 1,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [ "<list-of-desired-private-subnets-in-string-form>" ],
"securityGroups": [ "<your-color-app-security-group>" ],
"assignPublicIp": "DISABLED"
}
},
"schedulingStrategy": "REPLICA",
"deploymentController": { "type": "ECS" },
"enableExecuteCommand": true
}
```
A few things to note right off the bat:
- Remember the ARNs we fetched of the `red` and `blue` Cloud Map services in our `colors.local` namespace? We need to specify them in the service definitions above as values for the `registryArn` keys.
- Launch all resources into private subnets - in the interests of security, you should never launch a service (in App Mesh or otherwise) into public subnets. Even if you're making the service publicly accessible, this is generally done through a gateway mechanism, such as an [**Elastic Load Balancer**](https://aws.amazon.com/elasticloadbalancing/), the listeners for which will be the ones in the public subnets - not the application itself (*we're not fronting* `red` *or* `blue` *with load balancers, but this is just a good tip to keep in mind*).
Let's create these services.
```shell=
aws ecs create-service --cli-input-json file://02f-red-service.json
aws ecs create-service --cli-input-json file://02f-blue-service.json
```
### Testing the Implementation
Congratulations! You've launched the first 2 virtual nodes into your mesh! That being said, how do we test virtual-resource-to-virtual-resource connectivity within the mesh? That's where [**ECS Exec**](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html#ecs-exec-enabling-and-using) comes into play for our test architecture. We'll open a shell into each virtual node's container and use `curl` to simulate a connection to the other virtual service, in lieu of another application connection such as a database or microservice connection.
In the ECS console, first, grab the task ID for your `blue` service, then run the following command.
```sh=
aws ecs execute-command --cluster <cluster-name> \
--task <blue-task-id> \
--container blue-app \
--interactive \
--command "/bin/sh"
```
(*If you're on Windows, and you run into a command-not-found issue with* `/bin/sh` *, try changing this to just* `sh`*.*)
If ECS Exec has been correctly configured, you will be at a shell. Execute the following 3 commands:
1. `curl localhost` - This should print `blue`, since we're serving the `blue` application from this container.
2. `nslookup red.colors.local` - This should give you the private IP address of the `red` ECS task.
3. Finally, `curl red.colors.local` - This should print `red`, because the `red.colors.local` virtual service was configured as a backend to our `blue` virtual node.
Now, exit out of the `blue` shell, grab the task ID for your `red` service, then run the following command.
```sh=
aws ecs execute-command --cluster <cluster-name> \
--task <red-task-id> \
--container red-app \
--interactive \
--command "/bin/sh"
```
Execute the following 4 commands.
1. `curl localhost` - This should print `red`, since we're serving the `red` application from this container.
2. `nslookup blue.colors.local` - This should give you the private IP address of the `blue` ECS task.
3. Finally, `curl blue.colors.local` - This should fail, typically with a `recv` error or an empty reply from the server, because the red virtual node does not have any virtual service backends.
4. `while true; do curl blue.colors.local; sleep 1; done;`. This should put you into a loop where the `curl` command continuously fails. Leave this running in the shell for now and read the next steps below.
While that final command is running, navigate to the `color-mesh` in the App Mesh console, select the `red` virtual node, click on `Edit` in the top right-hand corner, and scroll down to `Service backends - recommended`. This is where we add backends to virtual nodes in the console. Expand this section, select `Add backend`, and specify the `blue.colors.local` virtual service name. Leave all other values at their defaults and select `Save` at the bottom of the screen.
Within 1 minute or less, the output of command loop #4 above should start printing `blue` instead of the persistent error you've previously encountered, since we've now added the `blue.colors.local` virtual service as a backend to our `red` virtual node. The reason why this doesn't happen instantly is because Envoy needs time to retrieve the updated configuration from the App Mesh service. Now, remove the virtual service backend we just added (so that the `red` virtual node has no backends), and watch the command loop return to spitting out the same error as before.
**Congratulations!** You've successfully configured the first 2 virtual nodes and virtual services in your mesh, and demonstrated asymmetrical virtual node communication between the `red` and `blue` services in your service mesh.
### Aside: Envoy Metrics
The Envoy container, within each of our tasks, exposes an [**administration interface**](https://www.envoyproxy.io/docs/envoy/v1.27.2/operations/admin) on port `9901`. We can `curl localhost:9901` from within the application container to get some key information about the execution of the Envoy container. Here are a few key metrics to be aware of:
- `curl localhost:9901/server_info`: Information about the currently-running instance of Envoy in this task.
- `curl localhost:9901/ready`: Prints the [**running state of the Envoy sidecar**](https://www.envoyproxy.io/docs/envoy/v1.27.2/api-v3/admin/v3/server_info.proto#envoy-v3-api-msg-admin-v3-serverinfo). This is particularly useful as a health check for the container.
- `curl localhost:9901/stats | grep connected`: Prints the `control_plane.connected_state` variable, which is set to 1 if the sidecar is connected to App Mesh. Useful for determining issues with the task's network connection.
- `curl localhost:9901/stats/prometheus`: Exposes Envoy container statistics in a **Prometheus-compatible** format, which can be plugged into an upstream Prometheus scraper.
You can find a full list of supported endpoints in the [**Envoy documentation**](https://www.envoyproxy.io/docs/envoy/v1.27.2/operations/admin). Envoy exposes a huge number of metrics that can lend insight into the operation of the proxy and downstream application in the same task, fulfilling the **metrics** pillar of observability for meshed applications. In a later section, we'll have a look at how we can send these metrics to an upstream observability backend using [**OpenTelemetry**](https://opentelemetry.io/).
## 3. The Virtual Router
Now, we'll introduce a [**virtual router**](https://docs.aws.amazon.com/app-mesh/latest/userguide/virtual_routers.html) to our mesh, and with it, the final 2 virtual nodes - `green` and `yellow`. Virtual routers cannot route traffic to other virtual services - they can only send traffic to **virtual nodes**.
### Green and Yellow
Before we can configure the virtual nodes, we first need to set up the Cloud Map services for both of these nodes in the `colors.local` namespace.

We'll take a similar approach to when we configured the `red` and `blue` services in our manifests. We'll start with the first namespace...
```json=
// 03a-router.json
{
"Name": "router",
"NamespaceId": "<cloud-map-namespace-id>",
"Description": "Green color service, and DNS record for multiple colors.",
"DnsConfig": {
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
}
}
```
```sh=
aws servicediscovery create-service --cli-input-json file://03a-router.json
```
... then move onto the second.
```json=
// 03a-yellow.json
{
"Name": "yellow",
"NamespaceId": "<cloud-map-namespace-id>",
"Description": "Yellow color service.",
"DnsConfig": {
"RoutingPolicy": "MULTIVALUE",
"DnsRecords": [
{
"Type": "A",
"TTL": 300
}
]
}
}
```
```sh=
aws servicediscovery create-service --cli-input-json file://03a-yellow.json
```
At this point, you're probably wondering - **why did I call the Cloud Map service corresponding to the `green` node `router` instead of calling it `green`?** Hold onto that question, because we're going to explain it in just a bit. (*You may be able to glean the answer from the service's description, but if you're still unsure, don't worry! We'll cover this after we configure the virtual router.*)
Now, we can set up the `green` and `yellow` virtual nodes. Once again, notice the similarities between the virtual node specifications, first for the `green` virtual node...
```json=
// 03b-green-node.json
{
"virtualNodeName": "green",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
],
"serviceDiscovery": {
"awsCloudMap": {
"namespaceName": "colors.local",
"serviceName": "router"
}
},
"logging": {
"accessLog": {
"file": {
"path": "/dev/stdout"
}
}
}
}
}
```
```sh=
aws appmesh create-virtual-node --cli-input-json file://03b-green-node.json
```
... then for the `yellow` virtual node.
```json=
// 03b-yellow-node.json
{
"virtualNodeName": "yellow",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
],
"serviceDiscovery": {
"awsCloudMap": {
"namespaceName": "colors.local",
"serviceName": "yellow"
}
},
"logging": {
"accessLog": {
"file": {
"path": "/dev/stdout"
}
}
}
}
}
```
```sh=
aws appmesh create-virtual-node --cli-input-json file://03b-yellow-node.json
```
*Note that we haven't specified a custom logging string for these virtual nodes - they will inherit the [**default format string**](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage) in the Envoy documentation.*
Now let's move on to ECS. Instead of pasting the task definition in its entirety as was done previously with the `red` and `blue` nodes, I'll instead highlight the sections of the task definition you should change, since the task definitions for all 4 nodes are very similar.
- `family` - This should be renamed to separate each task definition.
```json
"family": "green_node",
```
- The `name` of the main application container.
```json
"name": "green-app",
```
- The `environment` variable `COLOR` for the main application container.
```json
"environment": [
{
"name": "COLOR",
"value": "green"
}
],
```
- The names of the `colorapp` and `envoy` CloudWatch log streams. Here's an example for the `colorapp` container.
```json
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "color-mesh",
"awslogs-region": "<your-region>",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "green-app-container"
}
},
```
- The `APPMESH_RESOURCE_ARN` environment variable for the Envoy sidecar, so it knows which virtual node to implement.
```json
"environment" : [
{
"name" : "APPMESH_RESOURCE_ARN",
"value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualNode/green"
}
],
```
Once both task definitions (*I've called them* `03c-green-td.json` *and* `03c-yellow-td.json`) have been registered, we can create ECS services from them. First for the `green` service...
```json=
// 03d-green-service.json
{
"cluster": "<cluster-name>",
"serviceName": "green-svc",
"taskDefinition": "green_node",
"serviceRegistries": [
{ "registryArn": "<router-service-cloud-map-arn>" }
],
"desiredCount": 1,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [ <subnets> ],
"securityGroups": [ <colorapp-security-group> ],
"assignPublicIp": "DISABLED"
}
},
"schedulingStrategy": "REPLICA",
"deploymentController": { "type": "ECS" },
"enableExecuteCommand": true
}
```
```sh=
aws ecs create-service --cli-input-json file://03d-green-service.json
```
... then for the `yellow` service.
```json=
// 03d-yellow-service.json
{
"cluster": "<cluster-name>",
"serviceName": "yellow-svc",
"taskDefinition": "yellow_node",
"serviceRegistries": [
{ "registryArn": "<yellow-service-cloud-map-arn>" }
],
"desiredCount": 1,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [ <subnets> ],
"securityGroups": [ <colorapp-security-group> ],
"assignPublicIp": "DISABLED"
}
},
"schedulingStrategy": "REPLICA",
"deploymentController": { "type": "ECS" },
"enableExecuteCommand": true
}
```
```sh=
aws ecs create-service --cli-input-json file://03d-yellow-service.json
```
At this point in time, we've introduced the `green` and `yellow` virtual nodes to our mesh. However, these nodes are isolated:
- They have no backends, and so will not be able to reach other virtual services in the mesh.
- They have not been configured as a backend for any of the other virtual nodes (and they do not have a virtual service in front of them for this to be possible).

When we configured the `red` and `blue` virtual nodes, we fronted them directly with corresponding virtual services `red.colors.local` and `blue.colors.local`. In this section, however, we're going to front the `green` and `yellow` virtual nodes with a `color-router` virtual router, which in turn is going to sit behind a virtual service `router.colors.local`. The `color-router` is going to route incoming traffic to the `green` and `yellow` virtual nodes.
### Configuring the Virtual Router
As we mentioned previously, a virtual service can service traffic to virtual node backends in 2 different ways:
1. Directly to a **virtual node**, presenting a 1-to-1 connection between the virtual service and a virtual node.
2. To one or more virtual nodes by means of a **virtual router**, which sits betwen the virtual service and the pool of virtual nodes and allows for more advanced request routing to the nodes.
We've already explored the first method when we set up the `red` and `blue` virtual nodes and configured their corresponding virtual services to sit directly in front of those nodes. In this section, we're going to explore option #2.

We'll start by configuring the `color-router` virtual router in App Mesh...
```json=
// 03e-color-router.json
{
"virtualRouterName": "color-router",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
]
}
}
```
```sh
aws appmesh create-virtual-router --cli-input-json file://03e-color-router.json
```
... add a `color-route` to our mesh that will listen for incoming GET requests on port 80 and distribute traffic to our `green` and `yellow` virtual nodes at a 4:1 or 80/20 traffic spread respectively...
```json=
// 03f-color-route.json
{
"routeName": "color-route",
"virtualRouterName": "color-router",
"meshName": "color-mesh",
"spec": {
"priority": 1,
"httpRoute": {
"match": {
"method": "GET",
"port": 80,
"prefix": "/"
},
"action": {
"weightedTargets": [
{
"port": 80,
"virtualNode": "green",
"weight": 4
},
{
"port": 80,
"virtualNode": "yellow",
"weight": 1
}
]
}
}
}
}
```
```sh
aws appmesh create-route --cli-input-json file://03f-color-route.json
```
... and finally, front the virtual router with a virtual service.
```json=
// 03g-color-service.json
{
"virtualServiceName": "router.colors.local",
"meshName": "color-mesh",
"spec": {
"provider": {
"virtualRouter": {
"virtualRouterName": "color-router"
}
}
}
}
```
```sh
aws appmesh create-virtual-service --cli-input-json file://03g-color-service.json
```
### Explaining FDQNs and Virtual Routers
At this point in time, we can discuss why I chose to name the `green` virtual node's Cloud Map service `router` instead of just calling it `green`. In truth, it actually doesn't matter what you name the Cloud Map service, as long as the naming is **consistent** with other resources further down the line. **What does that mean?**
Our virtual router, despite routing traffic to **more than one** virtual node, is still being fronted by a virtual service that needs an FDQN that is **resolvable by virtual nodes within the service mesh**. Because our service discovery mechanism is handled by the **Cloud Map namespace** `colors.local`, we'll need a Cloud Map service called `router` in this namespace, and that service will need **at least 1 service instance entry**. This is so that when you make a DNS request for `router.colors.local` (e.g. using `nslookup`, which the `forsakenidol/colorapp` image has access to) from the virtual node, one or more entries are returned.
Envoy, despite knowing where to route your traffic when you make a request to `router.colors.local`, is still at the mercy of the upstream DNS server to actually return a response to the virtual node from which the DNS request for `router.colors.local` was made. If the `router` Cloud Map service doesn't exist, the DNS request will fail, and if this happens, no HTTP request will be made, and **Envoy will not have an HTTP request to intercept**.
<div style="background-color: #fa867f; padding: 2rem 2rem; margin-bottom: 1.5rem;">
<b>Note</b>: <b>Do not</b> proxy DNS traffic on port 53 through Envoy. You might be tempted to do this to simplify virtual router configuration (and possibly avoid the aforementioned DNS record setup), but the Envoy proxy does not have DNS server functionality and will not know how to handle the incoming DNS traffic, least of all because we haven't defined mappings for port 53. This is one of the reasons why App Mesh service discovery is reliant on a upstream DNS service separate from Envoy (Cloud Map in our circumstances) to provide it with the actual targets addresses for a given FDQN.
</div>
A typical deployment pattern which is made simple through the use of a virtual router is covered in the [**official AWS App Mesh Workshop**](https://www.appmeshworkshop.com/servicediscovery/). It involves a **canary** deployment, where the original `crystal` service served through a load balancer consists the old application, and the `crystal` service using service discovery constitutes the new application. The old application must be replaced by the new application with minimal downtime, a condition that can be met by using a virtual router and a route that slowly shifts traffic from the old application to the new application by modifying the target weights in increments. In this scenario, the virtual service in front of the virtual router is given the **shared name of the application** - `crystal.appmeshworkshop.hosted.local` - which makes sense in this context, given that the application remains the same. We could just as well have named our router's fronting service `green.colors.local`, as long as **the Cloud map service is also called `green`**.
In fact, even though the virtual service in front of the virtual router requires a DNS-resolvable FDQN, the actual targets returned by that FDQN **do not actually need to be served** by the virtual router - the DNS service just has to exist! We'll explore this further in our testing.
### Testing the Virtual Router
Before we can test our router, we need to add our `router.colors.local` virtual service as a backend to one of the nodes so we can `curl` this service. Let's add it to the `red` virtual node.
```json=
// 03h-update-red.json
{
"virtualNodeName": "red",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
],
"serviceDiscovery": {
"awsCloudMap": {
"namespaceName": "colors.local",
"serviceName": "red"
}
},
"logging": {
"accessLog": {
"file": {
"format": {
"text": "[%START_TIME%] %REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL% %RESPONSE_CODE%\n"
},
"path": "/dev/stdout"
}
}
},
"backends": [
{
"virtualService": {
"virtualServiceName": "router.colors.local"
}
}
]
}
}
```
```sh
aws appmesh update-virtual-node --cli-input-json file://03h-update-red.json
```
The virtual node's specification here is exactly the same as before, with the exception of the `backends` section we've just added, in which we've specified the `router.colors.local` virtual service.
Let's now **ECS Exec** into the red virtual node's ECS task to run some commands.
```
aws ecs execute-command --cluster my-cluster-1 \
--task <red-task-id> \
--container red-app \
--interactive \
--command "/bin/sh"
```
- `nslookup router.colors.local` - This should return the private IP address of the `green` virtual node's ECS task, because we registered it to the `router` Cloud Map service.
- `while true; do curl router.colors.local && echo; done` - This will repeatedly hit the `router.colors.local` DNS name. Notice the distribution between `green` and `yellow` at the expected 4:1 ratio as per our route configuration, despite the previous `nslookup` command only returning one value - this is because Envoy is intercepting the request and routing it as per our virtual router configuration, instead of simply allowing the request to be fulfilled by the resource returned by Cloud Map.
Leave the command loop running and navigate to the virtual router we just configured in the App Mesh service in the AWS web console. Navigate to the "Routes" tab, select the `color-route`, and click on the `Edit` button in the top-right. Here, we'll be able to modify the route's **Target Configuration**. Let's experiment with some of the following options while observing the output of the command loop.
- Set the `green` virtual node's relative weight to 0. Notice the percentage change that will result in 100% of the traffic going to the yellow node.
- Replace the `green` virtual node with the `blue` virtual node and add a positive, non-zero weight value to this target. Now, the target registered behind the `router` Cloud Map service (the `green` virtual node's ECS task) isn't being served by this router. Observe the lack of `green` in the `while` loop's output.
*(Remember to click the **Save** button at the bottom of the **Edit** menu after each change!)*
Experiment with different settings for the route's weighted targets and try to predict the traffic distribution each setting will result in. Once you're done, return the router's traffic distribution to the 80/20 split of `green`-to-`yellow` that it was at before.
## 4. The Virtual Gateway
Up until this point, we've only called the meshed virtual services from other virtual nodes. Now, we're going to introduce the fine key App Mesh component - the [**virtual gateway**](https://docs.aws.amazon.com/app-mesh/latest/userguide/virtual_gateways.html) - to expose our services outside of the mesh.

### Declaring the Architecture
As with all virtual resources in App Mesh, we must first begin by declaring the virtual gateway.
```json=
// 04a-color-gateway.json
{
"virtualGatewayName": "color-gateway",
"meshName": "color-mesh",
"spec": {
"listeners": [
{
"portMapping": {
"port": 80,
"protocol": "http"
}
}
],
"logging": {
"accessLog": {
"file": { "path": "/dev/stdout" }
}
}
}
}
```
```sh
aws appmesh create-virtual-gateway --cli-input-json file://04a-color-gateway.json
```
Alone, the virtual gateway doesn't have much of a function - it has a listener, but once traffic reaches that listener, the gateway currently doesn't know where to send it. We need to attach a [**gateway route**](https://docs.aws.amazon.com/app-mesh/latest/userguide/gateway-routes.html) that tells the virtual gateway where the traffic needs to be sent. Gateway routes have a lot of functionality for different route protocols, and they can match specific parameters in the request - in this tutorial, we'll demonstrate **prefix matching** on the request path.
```json=
// 04b-color-gateway-route-1.json
{
"gatewayRouteName": "color-gateway-route-1",
"virtualGatewayName": "color-gateway",
"meshName": "color-mesh",
"spec": {
"httpRoute": {
"action": {
"target": {
"virtualService": {
"virtualServiceName": "router.colors.local"
}
}
},
"match": {
"prefix": "/router"
}
}
}
}
```
```sh
aws appmesh create-gateway-route --cli-input-json file://04b-color-gateway-route-1.json
```
In the manifest above:
- We attach a gateway route to our `color-gateway` which looks for requests hitting the `/router` prefix on the virtual gateway.
- The inbound `/router` request must be made using the `HTTP/1` protocol.
- When a request matches this prefix, the route will send it to the `router.colors.local` service. Recall from the previous section that this service was implemented using a virtual router, with 2 backends - `green` and `yellow`.
Let's also add a root path `/` route that will direct traffic to another service - `blue.colors.local`.
```json=
// 04b-color-gateway-route-2.json
{
"gatewayRouteName": "color-gateway-route-2",
"virtualGatewayName": "color-gateway",
"meshName": "color-mesh",
"spec": {
"httpRoute": {
"action": {
"target": {
"virtualService": {
"virtualServiceName": "blue.colors.local"
}
}
},
"match": {
"method": "GET",
"port": 80,
"prefix": "/"
}
}
}
}
```
```sh
aws appmesh create-gateway-route --cli-input-json file://04b-color-gateway-route-2.json
```
In this scenario, we have a root path `/` and a `/router` path. When a valid request hits the virtual gateway, it will be matched against the **longer** path, meaning that a request for `/router` will not be matched against the root path because `/` is shorter than `/router`. This prevents double-ups in responses to requests.
### Implementing the Architecture
Now that we've declared our virtual gateway in App Mesh, let's implement it in ECS. While our virtual nodes were previously implemented by spinning up an Envoy proxy as a **sidecar** to the main application container in our tasks, a virtual gateway is implemented by configuring Envoy to **run alone** as the main container in a task.
Incoming traffic will reach Envoy directly, which will be listening on the same port we configured in `04a-color-gateway.json`, and Envoy will implement the upstream virtual gateway using [**the same environment variable**](https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy-config.html#envoy-required-config) as previously - `APPMESH_RESOURCE_ARN`. This means that we'll be opening ports on the Envoy container directly, instead of on another container in the task, because Envoy is the main container for the virtual gateway.
We'll start with the task definition.
```json=
// 04c-gateway-td.json
{
"family": "gateway",
"taskRoleArn": "<ecs-task-role-arn>",
"executionRoleArn": "<ecs-task-execution-role-arn>",
"networkMode": "awsvpc",
"containerDefinitions": [
{
"name" : "envoy-proxy",
"image" : "840364872350.dkr.ecr.ap-southeast-2.amazonaws.com/aws-appmesh-envoy:v1.27.0.0-prod",
"cpu": 256,
"memory": 512,
"portMappings": [
{
"containerPort": 80,
"protocol": "tcp",
"name": "gateway-port"
}
],
"essential" : true,
"environment" : [
{
"name" : "APPMESH_RESOURCE_ARN",
"value" : "arn:aws:appmesh:<your-region>:<your-account-id>:mesh/color-mesh/virtualGateway/color-gateway"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "color-mesh",
"awslogs-region": "<your-region>",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "04e-gateway"
}
},
"startTimeout": 30,
"healthCheck" : {
"command" : [ "CMD-SHELL", "curl -s http://localhost:9901/server_info | grep state | grep -q LIVE" ],
"interval" : 5,
"retries" : 3,
"startPeriod" : 10,
"timeout" : 2
}
}
],
"requiresCompatibilities": [
"FARGATE"
],
"cpu": "256",
"memory": "512"
}
```
```sh
aws ecs register-task-definition --cli-input-json file://04c-gateway-td.json
```
- Notice the updated value of the `APPMESH_RESOURCE_ARN` environment variable, to which we're now passing the ARN of the virtual gateway.
- Unlike in previous task definitions for virtual nodes, the Envoy container in the virtual gateway doesn't have to share its resource budget with any other container, and hence we can allocate it the lion's share of the task's resource requirements.
- Notice the lack of a `proxyConfiguration` field - there isn't a separate application container for Envoy to proxy traffic through this time - rather, Envoy **is** the main application container to which traffic will be sent. We omit the proxy configuration to prevent Envoy from proxying its own traffic in a loop.
Before we can bring up a task that will implement our virtual gateway, we need to configure a **security group** that will allow traffic on port 80 to reach the task. This is the first instance of a virtual mesh resource that is intended to be accessible from outside the mesh and outside the VPC, and so we'll need to configure a security group that allows us to reach the task. We'll call it `gateway-sg`.

Now, let's bring up an instance of this implementation.
```json=
// 04d-gateway-service-public-subnets.json
{
"cluster": "<cluster-name>",
"serviceName": "gateway-svc",
"taskDefinition": "gateway",
"desiredCount": 1,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [ <list-of-public-subnets> ],
"securityGroups": [ <colorapp-sg-id>, <gateway-sg-id> ],
"assignPublicIp": "ENABLED"
}
},
"schedulingStrategy": "REPLICA",
"deploymentController": { "type": "ECS" }
}
```
```sh
aws ecs create-service --cli-input-json file://04d-gateway-service-public-subnets.json
```
- Because we want the gateway to be publicly reachable, we must launch it into the public subnets of the same VPC in which the other 4 nodes in our service mesh reside. At present, this is the only public task in our entire mesh.
- The `colorapp` security group is the other security group we attach to tasks in this service, and is the same security group we previously attached to the other 4 tasks that implement the virtual nodes running in our color mesh. This security group will give the virtual gateway implementation the correct permission to route incoming traffic to the relevant backend(s) as per the gateway routes we previously configured, while the `gateway-sg` will provide the correct permission for us to reach the gateway task.
### Testing the Implementation
Because we enabled public IP assignment in our gateway service, we can access the task that's been spun up by this service via its public IP address. From your local development machine this time (*and assuming you have public internet access - how else would you be reading this guide?*), let's run the following command to test the virtual gateway:
```sh
curl <task-public-ip>
```
We're hitting the gateway on one of the 2 routes that it's been configured to listen for incoming traffic on, and the root path for our task should return `blue`. Let's try the other route.
```sh
while true; do curl <task-public-ip>/router && echo; sleep 1; done
```
*The `echo` command is just to space the loop's output.*
When we query this route, we should get the same `green` / `yellow` output at a roughly 4:1 ratio as when we queried the `router.colors.local` service from within our mesh (assuming you haven't modified the distribution, or that you've reset it back to default values - still, expect to see the same output as when you queried the virtual router from within the mesh).
```
// Example output
green
green
green
green
yellow
green
yellow
...
```
Like other aspects of the mesh, we can modify the gateway route in-flight while the above loop is running. Navigate to the `color-gateway` virtual gateway in the App Mesh service, select the **Gateway routes** tab, select the singular `color-gateway-route-1` gateway route and click on **Edit**. Experiment with the virtual service that this gateway route is configured to send traffic to, as well as the prefix which the route is listening for traffic on. Once you're ready, move onto the next section.
### Load Balancing the Virtual Gateway
Awesome - we now have an ECS task that implements our color mesh's virtual gateway with 2 routes to each of the `blue` and `router` virtual services. However, what happens if our gateway service becomes really popular? Well, the first thing we'd need to do is to scale up the service, introducing more tasks to account for the higher traffic load - but then, we'd have multiple access points - one public IP for every task implementing the virtual gateway. We need to define a **single access point** for all of these tasks, and improve our mesh's security by only allowing gateway access through the new access point we create. We can achieve all of these design requirements by using [**Elastic Load Balancing**](https://aws.amazon.com/elasticloadbalancing/) in conjunction with ECS.
<div style="background-color: #88f7b3; padding: 2rem 2rem; margin-bottom: 1.5rem;">
<b>Note</b>: AWS recommends <a href="https://docs.aws.amazon.com/app-mesh/latest/userguide/virtual_gateways.html#deploy-virtual-gateway"><b>using an NLB</b></a> to load balance tasks implementing a virtual gateway with App Mesh.
</div>
Let's bring down the 1-task service implementation of our virtual gateway and replace it with a [**Network Load Balancer**](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/load-balancer-types.html#nlb) (NLB) solution. We need to create the NLB and its associated [**target group**](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html) before we can use these resources with an ECS service. We'll start by creating the NLB.
```json=
// 04e-load-balancer-json
{
"Name": "gateway-lb",
"Subnets": [ <your-vpc-public-subnets> ],
"SecurityGroups": [ <colorapp-sg>, <gateway-sg> ],
"Scheme": "internet-facing",
"Type": "network",
"IpAddressType": "ipv4"
}
```
```sh
aws elbv2 create-load-balancer --cli-input-json file://04e-load-balancer.json
```
What have we just created? Let's review the load balancer specification.
- We are creating a `network` load balancer (NLB).
- Load balancers on AWS use [**Elastic Network Interfaces**](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) as [**ELB nodes**](https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html#availability-zones), which are used to distribute traffic to targets. The `Subnets` entry allows us to tell the NLB which subnets in our mesh's VPC these nodes should be launched into. For a publicly-accessible load balancer, we will launch these nodes into the public subnets of our VPC.
- We move our the `gateway-sg` security group to the load balancer, since this will be our new point of access into the mesh, and we'll also attach the `colorapp-sg` to give the load balancer nodes permission to route traffic to the virtual gateway tasks in ECS. When we get to launching those tasks, we'll remove `gateway-sg` from them, as we won't be accessing those tasks directly anymore.
- Because we want this NLB to be internet-accessible, the `Scheme` must be `internet-facing`.
Load balancers need target groups, where we'll be able to register the targets that we want traffic routed to. Let's create a target group for our load balancer.
```json=
// 04e-target-group.json
{
"Name": "gateway-tg",
"Protocol": "TCP",
"Port": 80,
"VpcId": "<your-mesh-vpc-id>",
"HealthCheckEnabled": true,
"HealthCheckIntervalSeconds": 10,
"HealthCheckTimeoutSeconds": 5,
"HealthyThresholdCount": 2,
"UnhealthyThresholdCount": 3,
"TargetType": "ip",
"IpAddressType": "ipv4"
}
```
```sh
aws elbv2 create-target-group --cli-input-json file://04e-target-group.json
```
- We have 2 options for registering targets to a target group - by instance ID, or IP address. Because our tasks run in Fargate (and they don't run on EC2 instances), we must register the targets by `ip`.
- We've defined a health check for our target group, but we haven't specified a port for the health check. The default behavior here is for the health check to be run on the same port on which the application will receive traffic, which is `"Port": 80`.
At this point in time, the NLB and target group are separate entities; there is no relationship between these components. We need to create a **listener** on the NLB to bind the target group - here's how we do that.
```json=
// 04f-nlb-listener.json
{
"LoadBalancerArn": "<your-nlb-arn>",
"Protocol": "TCP",
"Port": 80,
"DefaultActions": [
{
"Type": "forward",
"TargetGroupArn": "<your-target-group-arn>"
}
]
}
```
```sh
aws elbv2 create-listener --cli-input-json file://04f-nlb-listener.json
```
Remember to add your NLB and target group ARNs from the commands we ran previously. We are creating a listener that will accept TCP traffic on port 80 - the same port on which our virtual gateway has a route. Let's now create our gateway service in ECS again, but this time, we'll specify the target group in the service definition, allowing ECS to register the task IP addresses as targets.
```json=
// 04g-gateway-service-load-balanced.json
{
"cluster": "<cluster-name>",
"serviceName": "gateway-svc-load-balanced",
"taskDefinition": "gateway",
"loadBalancers": [
{
"targetGroupArn": "<your-target-group-arn>",
"containerName": "envoy-proxy",
"containerPort": 80
}
],
"desiredCount": 2,
"launchType": "FARGATE",
"platformVersion": "LATEST",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": [ <list-of-private-subnets> ],
"securityGroups": [ "<colorapp-sg>" ],
"assignPublicIp": "DISABLED"
}
},
"schedulingStrategy": "REPLICA",
"deploymentController": { "type": "ECS" }
}
```
```sh
aws ecs create-service --cli-input-json file://04g-gateway-service-load-balanced.json
```
We've disabled the tasks' ability to receive a public IP address and moved it into the private subnets of our VPC to prevent out-of-band access to the tasks, forcing all traffic originating from outside of our VPC to go through the NLB created in the public subnets.
Once the ECS tasks are running and their private IP addresses can be seen in the `gateway-tg` target group (indicating ECS is aware of the upstream target group and has successfully performed IP target registration), you can go ahead and either `curl` or access the DNS name of your load balancer in a browser and hit the same routes as before - `/` and `/router` - on the load balancer. You should receive the same responses as you did previously, when you queried the virtual gateway task IP address directly - except this time, you're accessing those tasks through an NLB.
**Congratulations!** You've just configured a simple service mesh using App Mesh on Amazon ECS. Let's recap the architecture so far.
1. We started by setting up the Cloud Map **service discovery namespace** and the parent **service mesh**.
2. We introduced the first 2 **virtual nodes**, `red` and `blue`, and demonstrated ***asymmetrical communication*** between virtual nodes.
3. We added another 2 virtual nodes, `green` and `yellow`, and put them behind a **virtual router**. We also discussed DNS requirement for the virtual router.
4. We introduced a **virtual gateway** to allow our meshed applications to become accessible from traffic sources outside of the mesh, using a carefully controlled access plane, and incorporated **Elastic Load Balancing** to distribute traffic between multiple instances of the virtual gateway.
This concludes the core components of App Mesh, and you should now have the knowledge (and hopefully the confidence!) to start building your own services meshes on Amazon ECS. If you're just looking for the basics of App Mesh, you can stop here. However, if you'd like to learn more about service mesh network security and observability solutions, read on! We're about to take a deep dive into some of the more complex topics.
**This guide is continued in [Part 2](https://hackmd.io/@ForsakenIdol/app-mesh-on-amazon-ecs-part-2).**
*Written by [ForsakenIdol](https://forsakenidol.com/).*