# Hypershift AWS Footprint
## Management cluster costs
### Compute (assuming this is majority of cost)
m5.4xlarge (16 vcpu, 64GB) at ~$550/mo on-demand. Seems like the reserved discount can quite significant (up to 72%) so not sure if this is valuable, but it the only firm number we have so consider it worst-case.
3 zone-spread workers can host ~12 HA HCPs
Current monthly cost per HCP: 3*550/10 = $137.50/mo
Target monthly cost per HCP: ??? "Call for price!" "Price too low to print!"
Density is largely dictated by kube-apiserver memory usage (1.75Gi per pod, 5.25Gi for HA HCP)
- **NOTE(dan)**: I don't think we have any good estimate of etcd resource consumption absent some kind of reference customer workload projection? Do we have data from ROKs or something we could extrapolate from in the meantime? What about a worst-case estimate based on the intrinsic limits of etcd (8Gi is the documented recommended max db size)? We're currently limiting them to 4Gi.
Total single replica HCP size is currently 4Gi. Hoping to trim that to 3.25Gi with catalog fixes.
Almost all HCP components (all memory large components) go to 3 replicas with zone-spread when HighlyAvailable. Thus maximum theoretical density is 64/3.25 ~= 27.5 with a lowest theoretical on-demand price of ~$82.50.
### Other resources
(Dan) Are ingress/egress costs in AWS ever a factor in any of our configurations?
## Guest AWS resources (diffs from standalone OCP in bold)
* VPC
* Public subnet (per zone)
- (dan) throughout, does "per zone" mean "1 or 3 depending on HA"?
- zone configuration is independent HCP HA settings and is out-of-band wrt to Hypershift. `hypershift` CLI is opinionated with single-zone but that is just the simpliest way.
- If you ask for an HA cluster, we won't schedule components within the same zone, what I am wondering is e.g. if you ask for an HA cluster do you need to have e.g. 3 IGWs or does 1 IGW service all zones? (same q for the other zonal resources)... diagram would probably help me on that
* Private subnet (per zone)
* Internet Gateway (per zone)
* Route Table (per zone)
* NAT Gateway (per zone)
* S3 VPC Endpoint
* **OIDC provider**
* **IAM roles for cloud-controller and node-pool (operators in HCP i.e. mgmt cluster) with OIDC provider as trusted entity** (after https://github.com/openshift/hypershift/pull/389)
* IAM roles for ingress, image-registry, aws-ebs-csi-driver (operators in guest cluster) with OIDC provider as trusted entity
* IAM instance role for workers
* IAM instance profile for workers
* Security Group for workers
* Route53 private zone for clusterName.baseDomain
* Route53 public zone for baseDomain (needs to preexist)
* if endpointAccess is PrivateAndPublic or Private
* **kube-apiserver and ingress VPC Endpoints**
* **Route53 private zone for Endpoint resolution** (still working this out, might not be required)
## Management AWS resources
* To support HCPs with endpointAccess: Public, mgmt cluster needs public ingress (at least a default router with endpointPublishingStrategy of type LoadBalancerService)
* A S3 bucket to store OIDC discovery documents and JWKS (after https://github.com/openshift/hypershift/pull/636)
### Per HCP
* EBS volume for etcd PV
- (dan) compute section assumes HA, should these also? would be 3x PVs, dunno if the cost difference is considered nominal or what
* switch endpointAccess:
* case Public:
* kube-apiserver ELB (internet-facing)
* case PrivateAndPublic:
* kube-apiserver ELB (internet-facing)
* kube-apiserver and ingress NLBs (internal)
* kube-apiserver and ingress VPC Endpoint Services
* case Private
* kube-apiserver and ingress NLBs (internal)
* kube-apiserver and ingress VPC Endpoint Services
## Hypershift Roles and Policies
### Guest Account
#### node-pool (capa-controller, similar to machine-api in standalone OCP)
```
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeNetworkInterfaceAttribute",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcs",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVolumes",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:DisassociateAddress",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:ModifySubnetAttribute",
"ec2:ReleaseAddress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"tag:GetResources",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions"
],
"Resource": [
"*"
],
"Effect": "Allow"
},
{
"Condition": {
"StringLike": {
"iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
}
},
"Action": [
"iam:CreateServiceLinkedRole"
],
"Resource": [
"arn:*:iam::*:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
],
"Effect": "Allow"
},
{
"Action": [
"iam:PassRole"
],
"Resource": [
"arn:*:iam::*:role/*-worker-role"
],
"Effect": "Allow"
}
]
}
```
#### cloud-controller (KCM, same as standalone OCP)
```
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeImages",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:CreateSecurityGroup",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:ModifyInstanceAttribute",
"ec2:ModifyVolume",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateRoute",
"ec2:DeleteRoute",
"ec2:DeleteSecurityGroup",
"ec2:DeleteVolume",
"ec2:DetachVolume",
"ec2:RevokeSecurityGroupIngress",
"ec2:DescribeVpcs",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:AttachLoadBalancerToSubnets",
"elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
"elasticloadbalancing:CreateLoadBalancer",
"elasticloadbalancing:CreateLoadBalancerPolicy",
"elasticloadbalancing:CreateLoadBalancerListeners",
"elasticloadbalancing:ConfigureHealthCheck",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:DeleteLoadBalancerListeners",
"elasticloadbalancing:DescribeLoadBalancers",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"elasticloadbalancing:DetachLoadBalancerFromSubnets",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:SetLoadBalancerPoliciesForBackendServer",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:CreateListener",
"elasticloadbalancing:CreateTargetGroup",
"elasticloadbalancing:DeleteListener",
"elasticloadbalancing:DeleteTargetGroup",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeLoadBalancerPolicies",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:ModifyListener",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:SetLoadBalancerPoliciesOfListener",
"iam:CreateServiceLinkedRole",
"kms:DescribeKey"
],
"Resource": [
"*"
],
"Effect": "Allow"
}
]
}
```
### Management Account
#### hypershift-operator
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateVpcEndpointServiceConfiguration",
"ec2:DescribeVpcEndpointServiceConfigurations",
"ec2:DeleteVpcEndpointServiceConfigurations",
"elasticloadbalancing:DescribeLoadBalancers"
"s3:PutObject",
"s3:DeleteObject",
],
"Resource": "*"
}
]
}
```