# DevSecOps Roadmap Label relative necessity 1- trivial/stretch goal 2- nice to have 3- want to have 4- important 5- need/essential TODO: ~~reduce epic size,~~ create tickets for 3 epics below, architecture diagrams ## Historical Priorities #### Improved DevOps infrastructure: - Continuous deployment reliability and robustness improvements - Production operational responsiveness - End-to-End testing aka integration smoke tests - Cybersecurity assessment, scanning, and vulnerability management #### Additional backlog features: - Improved Kibana Reporting and Analytics - User Management UI (OFA Admins and Regional Staff) - User Access Release 3.5+ improvements - Research-oriented environment w/ Blue/Green testing - User-feedback features and bugs not yet accounted for ## Current DevOps Priorities * [~~CI path-filtering front/back separation~~](https://github.com/raft-tech/TANF-app/issues/2457) * [single clamav scanner](https://github.com/raft-tech/tanf-app/issues/2429) * [nexus standup](https://github.com/raft-tech/tanf-app/issues/2116) - [~~split out gunicorn_start.sh~~](https://github.com/raft-tech/TANF-app/issues/2347) - deploy custom manifest [here](https://github.com/cloudfoundry/python-buildpack#building-the-buildpack) - docker directly -- ATO questions - remove makemigrations for deploys - value: deployment speed, reduced complexity, stable dev exp * [spike on pipeline refactor](https://github.com/raft-tech/TANF-app/issues/2419) * [catch migration failures in pipeline](https://github.com/raft-tech/TANF-app/issues/1623) * deploy over failed deployments * split out use-cases, draft * [integrate hashicorp vault into deployments](https://github.com/raft-tech/tanf-app/issues/2225) * database seed - to be drafted by George - replace deploy-backend.sh - terraform? - value: prevent unplanned debugging of failed deploys - spike for researching cloudfoundry/cloud.gov-way to do our deployments History: https://app.mural.co/t/raft2792/m/raft2792/1677172199979/6dae0de24469eb9093af73bdf45a6f1760659abc?sender=60470f8f-3c49-4c8d-a8b8-939dc74f72f7 ### Continuous Integration Revamp - Nov/Dec/Jan -5- 1373, 2000, var storage, sending vars #circleci cfg - value: reduced complexity, reduce techdebt, break up monolith -4- 2115, 2116 # pre-built docker image and/or improve buildpacks - value: reduce build time, reduce local errors, simplifying circleci processes ### Integration test - Nov/Dec/Jan -5- 310, 2141 - initial cypress setup (auth'd "hello world") -5- 2274 - user approval -5- 2282 - file upload -2- 1492 - pa11y improvement ### Continuous Deployments - Feb/March 1623 - fail pipeline if deployment fails -- need to split up tasks 2429 - singular prod ClamAV scanner value: frees up so much RAM, possibly more dev environments -5i- terraform on-demand CD (branch-name vs sandbox) value: replaces dumb bash scripts, state-machine/desired states, idempotency, infrastructure-as-code, nix manual intervention to fix, robustness/confidence -3i- better deployment condition checks value: shorter feedback loop while dev-ing -3- redeploy over previous failed build value: robustness, certainty that we always have the app running -3i- UX demo environment # sub-epic -3i- true rolling deployments/investigate --no-route - value: no downtime unless code is broken ### Cloud gov improvements - Apr/May? -4i- domain name mocking for dev/staging - value: testing system-tier features before prod, confidence boost -2i- live env var reloads - value: dev in deployed, deployed envs flexible -1- domain name mocking for local ### Continuous Integration Improvements - Apr - should be done after CD revamp -3i- 1623 # circleci catch migration/deployment failures -4i- 831 # logging aggregation -3i- 2242, 1854 # bugs - trufflehog, overwriting env vars for diff spaces -3i- 1620, 215, 216 #whats deployed where value: lineage of code/traceability -1- 1705, 1786 #linting value: catch silent lint failures for deployments -1- 1530 # container optimize -1??- circleci frontend/backend separation # don't rebuild/retest unchanged code ### Key Management - March -3- 1977, 2225 # hashicorp vault - value: secure way to rotate jwt_keys, infra-as-code, supplants part of bash deploy scripts -4i- 2118 # persist env vars across rebuilds - value: clean/efficient deployments, prevent errors, infra-as-code, robustness -1- automated key rotation 2435 - env vars for staging/prod ### Continuous Deployments Enhancements - May -3i- terraform auto-deploy all branches # branchname.raft.aws.com vs branchname.app.cloud.gov -3- terraform auto take-down value: cleanup, prevent littering, free space, prevent staleness -2i- auto-rollback on failed deploys value: only good for prod, good with 215 ### Security Level Up - Apr/May -2i- SAST - static/dynamic code analysis -1i- authenticated app crawler/exploiter -3i- addressing vuln dependencies/version updates ### ops dashboard/notifications - sentry for first pass (831) - live monitoring feeds for prod, develop - splunk logstash datadog - smoke test/pings - dynamically control "down for maintenance" page ## Lift-shift off Cloud.gov - Docker v. buildpacks - start once ATO questions are answered. no timeline, this might end up being a big "why" for contract renewal: lift-shift to AWS GovCloud # blocked by 2115/2116 as well as ATO - ATO questions - evaluate dev-local CF - hardening dockerfiles - simplifying docker-compose