Controlled access to resources in multicloud environment

# Controlled access to resources in multicloud environment ## Requirements ### Functional requirements - The resources are accessible via networking over the TCP protocol (SSH, HTTP, postgres...) - Users already use a standard SSO solution at the company level (say Microsoft AD). - Not all users are technical users therefore only a subset of these users should be allowed to access the resources. - The allowed users have different roles (Dev, DevOps, Support) so that access to a resource can be done on a role basis. - It should be possible to allow a connection to a resource for a limited time (5 minutes for a deployment for instance, 1 hour to investigate an incident...), but some accesses can be permanent (dev environments for developers). - It should be possible for some users (administrator role) to have a real-time view of all the resources being currently accessed and for connections to be terminated immediately, should the administrator decide to do so. ### Non-functional requirements - The designed solution should be deployable on premise, within our own cloud environment, you can assume AWS. - It should handle a volume of a few hundreds of concurrent connections to the different resources. ## Proposed solution ### Key properties - Zero trust security with 2FA - Actions audit - TCP proxy support - SSO support, namely Microsoft AD - RBAC, provisioned based on SSO roles - Timed access based on roles - Realtime monitoring of established connections - On-premises - Scalable ### Hashicorp Boundary > HashiCorp Boundary is a tool for managing identity-based access for modern, dynamic infrastructure. ### Glossary 1. *Target* -- target is a target machine a user would like to connect to ### Supported & missing features * [x] Zero trust security with 2FA -- 2FA can be set up on Microsoft AD side * [x] Actions audit -- allows to set up logging of different events (Connection, diconnection tc) * [x] TCP proxy support * [x] SSO support, namely Microsoft AD * [x] RBAC, provisioned based on SSO roles * [x] Realtime monitoring of established connections -- web ui admin panel has this functionality * [x] On-premises * [x] Scalable -- design supports scaling of both controllers and workers. RDBMS might be a bottleneck, but not likely as it is not used in proxying runtime * [ ] Timed access based on roles #### Timed access based on roles Out of the box, Boundary doesn't support time limited access, in fact, `session_max_seconds` is a parameter of a target, not a role. But this is a minor feature, which can be implemented by duplicating targets for each type of access, eg: 1. `ubuntu-123-1h` -- target `ubuntu-123` with session max time of 1h 1. `ubuntu-123-12h` -- target `ubuntu-123` with session max time of 12h This can be easily automated (via dynamic target discovery), allowing to give different session time for different users based on RBAC. #### Overview on architecture Boundary's main components: 1. Vault -- stores all credentials 2. Postgres -- RDBMS for all metadata 3. Controller -- responsible for serving all API calls, authentication, access managment & etc 4. Worker -- responsible for traffic proxying between client and target machines [Recommended](https://developer.hashicorp.com/boundary/docs/getting-started/installing/production) HA production architecture deployment in AWS ![](https://i.imgur.com/SAQhxOd.png) #### Proposed architecture Proposed architecture is basically copying recommended AWS architecture from Hashicorp documentation, except for including Azure AD as an identity provider. Target host is kept abstract, as it could be any amount of remote or local resources, as long as workers can reach them. It is assumed network latency is low between AWS and target systems. Autoscale is not illustarted for simplicity of the diagram. CloudWatch is used for storing audit logs ![](https://i.imgur.com/gcDDoiF.png) #### Downsides The biggest downside of such a solution is relying on a centralized (to a point) Hashicorp Boundry installation. In case it fails (even with HA deployment), we risk to be left locked out. For preventing locking out, we can consider several options: 1. Several completely independent Boundry installations in different clouds, which would make it highly unlikely to be down at the same time, but expensive 1. Cloud providers usually have ways to reset SSH keys/root passwords on instances. We could have a ready automation allowing to reset credentials on all machines using domain admin user. This is not instant but effective, especially if we one or just a few clouds. 1. Having emergency keys in place on all the machines. This is tricky, because stealing them means an attacker can login unnoticed and persist in the system.