TDRS Offboarding / Knowledge Sharing

# TDRS Offboarding / Knowledge Sharing ## Accounts to be Deactivated or Removed - [ ] GitHub Repo Admin - [ ] Django Admin on staging and dev spaces - [ ] Circle CI - [ ] Cloud.gov - [ ] Login.gov See also: [Carl's offboarding hackmd](https://hackmd.io/47rLsTQtS9KzTJl1EQ7xgg) ## Open Issues Authored by Me - [As a dev, I want to be on the latest stable version of Python #782](https://github.com/raft-tech/TANF-app/issues/782) Since this ticket was opened we upgraded our docker python version to 3.8.9 and the current default version for buildpacks is 3.8.12. However, [Python 3.10](https://www.python.org/downloads/release/python-3100/) is the current LTS version. This is a lower priority upgrade but there are some security fixes and new features (like [structural pattern matching](https://www.inspiredpython.com/course/pattern-matching/mastering-structural-pattern-matching) :fire:) - [Secret Key Leakage Mitigation for Release 1 #972](https://github.com/raft-tech/TANF-app/issues/972) All issues are closed but the epic was left open until `jwt-key-rotation.md` is updated to reflect the manual process of rotating `DJANNGO_SECRET_KEY` - [Uploads: Implement Versioning for Data Files #1007](https://github.com/raft-tech/TANF-app/issues/1007) Currently re-uploading a given section for the same STT and year will result in overwriting the file in S3. Versioning will need to be properly implemented so the previous files are not lost. This only occurs in Cloud.gov deployed environments - as localstack has versioning enabled. - [Automate inactivity teardown for Cloud.gov deployments in the dev space #1060](https://github.com/raft-tech/TANF-app/issues/1060) - [Django Admin: CORS errors loading fonts from S3 #1143 ](https://github.com/raft-tech/TANF-app/issues/1143) - [Elasticsearch Implementation #1348](https://github.com/raft-tech/TANF-app/issues/1348) This epic is critical to the completion of the parsing work but most of the issues could be worked in parallel to parsing work - up until #1354. - [Terraform: Automate Deployment of Elasticsearch #1349 ](https://github.com/raft-tech/TANF-app/issues/1349) - [As an OFA staff member, I want to be able to access Kibana from outside of AWS GovCloud #1350](https://github.com/raft-tech/TANF-app/issues/1350) - [Elasticsearch - Define Index Mappings #1351 ](https://github.com/raft-tech/TANF-app/issues/1351) - [Implement task queuing / scheduling system in Django #1352 ](https://github.com/raft-tech/TANF-app/issues/1352) - [Create ADR to document decision to move forward with Elasticsearch #1353](https://github.com/raft-tech/TANF-app/issues/1353) - [Implement storage of parsed data in Elasticsearch #1354 ](https://github.com/raft-tech/TANF-app/issues/1354) - [Spike: Research questions around DIGIT teams query usage for parsed data #1355](https://github.com/raft-tech/TANF-app/issues/1355) - [ZAP False Positives - Unexpected Content Type & Client Error response code #1455 ](https://github.com/raft-tech/TANF-app/issues/1455) ## Thoughts on Future Work Efforts ### User Access Request Currently when a user fills out the `Request Access` form the modifications are made immediately to their instance of the User model. This results in the ability for a user with the `Data Analyst` to arbitrarily change their STT and gain access to Data Files from other grantees. Additionally a Regional Manager or System Admin user has no queue to work from to manage these requests and approve or deny them. Without the intermediary model there would need to be more complicated database filtering on the User model to appropriately derive a list of access requests and there is no outright mechanism for an approval/deny process. An intermediary model could look like this pseudocode: ``` class AccessRequestManager(Manager): def pending(self) -> QuerySet: return self.filter(state='pending') class AccessRequest(Model): user = ForeignKey(User, related_name='access_requests') reviewed_by = ForeignKey(User, blank=True, null=True, related_name='reviewed_access_requests') state = Choices('approved', 'denied', 'pending', default='pending') requested_stt = ForeignKey(STT) requested_region = ForeignKey(Region) requested_roles = OneToManyField(Group) # This could also be a ForeignKey objects = AccessRequestManager() def approve(self, reviewed_by: User) -> None: self.reviewed_by = reviewed_by self.state = 'approved' self.save() self.user.stt = self.requested_stt self.user.region = self.requested_region self.user.roles.set(self.requested_roles) self.user.save() def deny(self, reviewed_by: User) -> None: self.reviewed_by = reviewed_by self.state = 'denied' self.save() ``` With that intermediary model we would open up the doors for several benefits: - Call the `approve` method on an `AccessRequest` to update both the request and the `User` instance it is tied to. - Using the above method a Django Admin action could be easily added to call that method for selected access requests. - The model can be tied to a ModelViewSet to allow easy access to this data for the frontend to populate a Regional Manager admin interface. - The `approve` method can be called with a detail view on the above viewset to allow Regional Managers to approve these requests through the API/UI. - Retrieve a users access requests with `user.access_requests.all()` or narrow them down to just the pending requests using `user.access_requests.pending()` Then instead of making a call to the `set_profile` endpoint when a user fills out the Request Access form, the call would instead be made to the `AccessRequestViewSet` which would create a new access request with the default state of `pending`. From there no changes would be made to the user instance until the `approve` method is called on the `AccessRequest`. `set_profile` could then be repurposed to simply allow a user to change their first and last name. This model and all the flows associated with it could be easily re-used for existing users who want to request a change in the access they already have. ## GitHub Actions in Use In addition to Circle CI, this project uses GitHub Actions to run two workflows triggered from the label event on a Pull Request. These are both located in the `.github/workflows` directory in the repository. ### [deploy-on-label.yml](https://github.com/raft-tech/TANF-app/blob/raft-tdp-main/.github/workflows/deploy-on-label.yml) On assignment of any label starting with `Deploy with CircleCI` to an open PR in GitHub this action will trigger a deploy job within CircleCI for the branch defined in the given PR. #### Run Conditions - On pull request `labeled` event - If the label assigned starts with `Deploy with Circle CI` - If all PR status checks are passing #### Logical Flow - Checkout latest commit on current branch - Extract the target deployment environment name from the PR label. (`Deploy with CircleCI-raft` => `raft` deploy environment) - Using the GitHub API make a request to get the latest Status Checks for the current branch. This will be used to ensure that all checks have passed in CircleCI prior to commencing the deployment. - Leverages [this open source GitHub Action](https://github.com/octokit/request-action) - Makes a request to the combined [status checks API in GitHub](https://docs.github.com/en/rest/reference/repos#statuses) - Extract the combined state from the GitHub API response and store only the state value as an output for future steps. - If the PR State is `success` make a request to the V2 CircleCI API to initiate the `dev-deployment` workflow for the current branch. - Leverages [this open source GitHub Action](https://github.com/promiseofcake/circleci-trigger-action) #### Possible Improvements - The label name could be defined as environment variables and accessible to all job steps by [using an env map](https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#env), similar to how it is done [here](https://github.com/raft-tech/TANF-app/blob/raft-tdp-main/.github/workflows/qasp-owasp-scan.yml#L18). This would be mostly a tech debt improvement, as it would simply be less fragile and easier to maintain. ### [qasp-owasp-scan.yml](https://github.com/raft-tech/TANF-app/blob/raft-tdp-main/.github/workflows/qasp-owasp-scan.yml#L18) On assignment of the `QASP Review` label to a PR in GitHub, runs the OWASP ZAP scans against both the frontend and backend for that specific branch in Circle CI. #### Run Conditions - On pull request events `labeled`, `synchronize`* or `reopened`. - `synchronize`* refers to the event that occurs after new commits have been pushed to an open PR - this allows the action to re-run ZAP scans for new commits after moving into QASP review. - For the `labeled` event there is additional filtering to ensure that this doesn't run when a PR already has the `QASP Review` label and a new label is being assigned. - If the PR has the label assigned `QASP Review` - If there are no in-progress or queued jobs - The action uses [concurrency](https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#concurrency) to ensure that only a single ZAP scan will run for a given branch at any time. - If there is an in progress job for the branch it will first be cancelled - Any jobs queued during the cancellation will be paused - All jobs except for the latest will be cancelled - Only one remaining job will run and invoke the Circle CI API - Note: after Circle CI has initiated the job and responded to GitHub the action will be marked complete - any new actions queued immediately after can still cause a second run. This is a very small race condition though, unlikely unless a developer is making many successive commits to an open branch in QASP review. #### Logical Flow - Set environment variable `HAS_QASP_LABEL` by using a contains expression on the `github.event.pull_request.labels` property. - Perform additional filtering to insure labeled event doesn't run if the label in question isn't `QASP Review` - If the `HAS_QASP_LABEL` variable is `'true'`` then make a request to the Circle CI V2 API to initiate the `run_owasp_scan` workflow. #### Possible Improvements - YAML file is missing a doc block with this information. ## Documentation Organization Would definitely recommend making the main README more of a table of contents with links to docs in subdirectories. Currently there is some duplicated information between the main README and the READMES in the backend and frontend folder. Apart from needing to update links inside of the markdown that point to relative paths I don't see any issues from moving all docs into subdirectories in the /docs folder. Do note that these three directories are explicitly referred to using relative paths in Circle CI config and other scripts: - `/scripts` - `/tdrs-backend` - `/tdrs-frontend` If any changes are made to the location of those folders the associated references would need to be updated or errors would occur in CI or during deployments. ## Writing Issues In my opinion an issue should always start with a summary of the current condition and an explanation of why it needs to be changed or improved. Screenshots and error stacktraces are invaluable information to include in issues. Strive to maintain brevity while still including all of the relevant context. Working an issue without context can lead to the wrong choices and on the other side of the coin too much information in the issue can lead to over-engineering. ### Acceptance Criteria These should focus on the business requirements, not necessarily the technical steps that need to be taken to get there. ### Tasks Focus on the *what* not necessarily the *how*. These should be technical in nature, while imposing as little restrictions on the implementation as possible to keep it in line with the business requirements. During the development process new information is constantly learned and it can be difficult to plan ahead for every single possible hurdle involved in the implementation of a ticket. By leaving these open ended a developer has a lot more flexibility to overcome unforeseen issues without having to mark the issue as blocked and send it back to refinement. #### Example ##### Problem CORS error in Django Admin when loading fonts ##### Acceptance Criteria No CORS errors occur in Django Admin ##### Tasks Resolve CORS errors in Django Admin Note that the tasks don't say *how* to fix those errors. Now take for example the errors could be fixed by adding a header to allow CORS for that resource and we made a task to explicitly add that header. However, now consider that when the developer starts to work on the issue they discover that the fonts encountering an error are never even used. Adding the header would serve no practical benefit since the fonts still aren't used. By leaving the task focusing on what needs to be done, the developer can make judgement calls like this and flag during the review process rather than pushing the issue back into refinement and marking it as blocked until a decision is made whether to deviate from the assigned task. However, if such potential solutions are identified during ticket refinement or preliminary reviews it is helpful to notate them in the description of the issue as a potential way to complete the tasks. ## Helpful Aliases / Functions These can be added to the `.bashrc`, `.zshrc` or `.bash_profile` on a Linux or macOS system (depending on your shell/system configuration) and offer a few commands that make it easier to work with the TDRS platform during local development. ``` # Replace this with the path matching the cloned repo on your system TDRS_HOME="$HOME/Raft/Repositories/TANF-app/" # TDRS Backend aliases alias cd-tdrs-backend='cd "$TDRS_HOME/tdrs-backend"' alias tdrs-backend-compose='cd-tdrs-backend && docker-compose -f docker-compose.yml -f docker-compose.local.yml' alias tdrs-backend-down='tdrs-backend-compose down' alias tdrs-backend-up='tdrs-backend-compose up -d web' alias tdrs-backend-restart='tdrs-backend-compose restart web' alias tdrs-backend-hard-restart='tdrs-backend-down && tdrs-backend-up' alias tdrs-backend-rebuild='tdrs-backend-down && tdrs-backend-compose up --build -d web' alias tdrs-backend-lint='tdrs-backend-compose run --rm web bash -c "flake8 ."' alias tdrs-backend-exec='tdrs-backend-compose exec web /bin/bash' alias tdrs-shell='tdrs-backend-compose run --rm web bash -c "python manage.py shell_plus"' # TDRS Frontend aliases alias cd-tdrs-frontend='cd "$TDRS_HOME/tdrs-frontend"' alias tdrs-frontend-compose='cd-tdrs-frontend && docker-compose -f docker-compose.yml -f docker-compose.local.yml' alias tdrs-frontend-down='tdrs-frontend-compose down' alias tdrs-frontend-up='tdrs-frontend-compose up -d tdp-frontend' alias tdrs-frontend-restart='tdrs-frontend-compose restart tdp-frontend' alias tdrs-frontend-hard-restart='tdrs-frontend-down && tdrs-frontend-up' alias tdrs-frontend-rebuild='tdrs-frontend-compose down && tdrs-frontend-compose up --build -d tdp-frontend' function pytest-tdrs () { if [ "$#" -lt 1 ]; then quoted_args="" else quoted_args="$(printf " %q" "${@}")" fi tdrs-backend-compose run --rm web bash -c "./wait_for_services.sh && pytest ${quoted_args}" } function get_cg_jwt_key () { raw=$(cf env tdp-backend-raft | sed -n -e '/JWT_KEY/,/END PRIVATE KEY/ p') priv_key=${raw#"JWT_KEY: "} echo "$priv_key" | openssl enc -A -base64 } ``` ## Skills to Look For in a New Developer ### Python Probably the most important, I see most of the heavy lifting remaining for TDRS in the parsing epic which will be primarily done in Python. There is enough work in that epic to make it an "all hands on deck" kind of situation so any new developer on the project would need to be fluent in Python so they can jump right in. Would recommend at least 3 years experience. ### Django Django is a great framework but has a lot of "magic" that gets done behind the scenes. Having someone deeply familiar with Django internals would be a tremendous help to the team. Coming from other frameworks it is easy for a developer to find themselves working against this "magic" performed by Django, while someone who has a couple years experience or more in the framework would be able to avoid those and work with the framework to fulfill technical requirements. ### Elasticsearch / Opensearch Another critical piece of the remaining work on TDRS. While a lot of the difficult administrative tasks are abstracted away for us by Cloud.gov and AWS there is still a lot of how ES works that takes time for a developer to gain experience in. Finding someone who has already worked with Elasticsearch in the past would be a great benefit. As an FYI, Opensearch is essentially the same thing - that is the name given to the open source fork of Elasticsearch created by AWS. Do note that they have diverged from eachother and as time goes by they will become less equivalent. Based on Cloud.gov documentation it is currently running AWS Elasticsearch version 7.4 (pre-divergent fork), however since AWS is moving everything towards Opensearch I can see that being the move taken by Cloud.gov as well. ### Docker Docker is still used pretty heavily in local development. Most developers I have encountered recently have experience with Docker so this should be an easy find. ### Cloud Foundry Cloud.gov is basically the government flavor of Cloud Foundry. Explicitly listing Cloud Foundry as the requirement will likely draw more candidates from the public sector who wouldn't have worked with Cloud.gov but have all the relevant technical knowledge needed. ### Circle CI or similar CI tool CircleCI, Jenkins, GitHub Actions and other common CI tools are usually pretty similar. As long as the developer understands YAML and how to test/build an application they should be able to adjust pretty easily. ### JavaScript React specifically would be a nice to have but I feel the frontend side is pretty well covered with the current remaining Dev team. Would still be good to have someone at least familiar with common JS frameworks and syntax to help with reviews or backfill during vacations/etc.