# (Actual) Continuous Integration for Backtest
<!-- Put the link to this slide here so people can follow -->
slides: https://hackmd.io/eWd-mxljQAGPHO59-2gZ9Q
---
## What is CI/CD?

---
## The promises of CI
- Reproducable, deterministic builds and deployments
- No manual steps involved
- Approval by humans still on the table
- Automate correctness checking of code before it's in production
- A/B testing with rollback capabilities
---
## Current Development Process
- Checkout a scripts folder from perforce
- Activate an arbitrary conda env (eg. `sotqrenv`)
- Possibly add new deps to your conda environment as needed
- Develop and test the script
- Write unit tests to verify your changes (sometimes)
- Manually run the python script on a linux console for "integration" testing
- Use jupyter notebooks to analyze the generated results.
- Rinse/Repeat
---
## Current Deployment Process
- Backtest: fork or edit the existing sigjenkins jobs using the jenkins web UI.
- In case of TPO, tell Amanda the team city build, jobs in question, and conda environment recipe
- In both cases, do a test run in prod, manually rollback if there are problems.
(unfortunate that bad data might exist in prod)
---
## Opinionated Deployment Philosophy
- Containerizer friendly
- Deploy just one thing and its dependencies (including scripts)
- Everything should be versioned and defined in git (including Jenkins!)
- No human interaction required from start to finish after a gitlab merge to main.
- Verify correctness of the generated data *before* putting them in the production filer paths.
---
## The Pipeline

---
## Conda Environment Deploy
- Build an environment exactly equal to what we need to run script (*including* workflow3!)
`mamba create -p <scratch-dir>/your-project-1.0.0-py38 yourproject=1.0.0 python=3.8`
- Squash the environment using `squashfs`
- Use `rrsync` to copy it to the filer via `pdsyncbal`
- Use standardized filer location (`/nfs/btcache/conda_envs/sotbtdata`)
---
## What the heck is rrsync?
`rrsync`: *a script to setup restricted rsync users via ssh logins*
To use, add a line to the `~\.ssh\authorized_keys` for the deploy user (`sotbtdata` here):
```
...
# 2022-05-11 rsync deploy key for gitlab ci
command="$HOME/bin/rrsync-3.2.3.pl /nfs/btcache/conda_envs/$USER"
ssh-ed25519 <private-key> gitlab-ci-deploy-key
```
---
## Jenkins Deploy (1)
**Goal**:
- No manual steps to create or update your jenkins pipeline.
- But developers should have full power to make the pipeline behave as they choose.
**Solution**:
- Define your pipeline in your own gitlab repo that contains your scripts.
- Use the jenkins rest api to publish it to Backtest.
- (via curl commands- the python apis are underwhelming)
---
## Jenkins Deploy (2 - Job Types)
- Freestyle pipelines
- Most of our manual pipelines are of this type
- Declarative pipelines
- Uses groovy syntax defined in a `Jenkinsfile`
- Better than xml at least, but would prefer yaml
- In either case the final target for jenkins is a `config.xml`
---
## Jenkins Deploy (3)
### What approach?
- `config.xml` is gross, can't use that directly
- We want developers to be able to define not just the pipeline script
(but also the build parameters)
- Need some way for jenkins to run the latest version
(without the developer editing a version parameter manually)
---
## Jenkins Deploy (4)
- Developers must define
- `Jenkinsfile`: for the build parameters and pipeline script
- `description.html`: Job description and @owner annotations
- `jenkins-config.xml`: A jenkins config template
- Why not use yaml?
- Why not use a shell script directly with a freestyle project?
---
## Jenkins Deploy (5 - Rendering)
- The gitlab jenkins deploy stage will
- Use jinja to render the final `config.xml`
- Deploy it via the REST api by creating a new jenkins job or updating existing
- *NOTE*: the same conda environment we deployed earlier is also used by jenkins
- same with condor
- There is a WIP [cookie cutter template](https://gitlab.ds.susq.com/eot/qed/sprc/jenkins-cookiecutter) to bootstrap a project
(but forking an existing project is kinda easier)
---
## Notes on squashfs (1)
### Pros
- Acts just like `tar` but is a mountable filesystem using `fuse`
- Fast compression with a multicore encoder/decoder
- Immutable after being mounted (and still compressed)
- Read access over filers is much faster (it's just one file descriptor)
---
## Notes on squashfs (2)
### Cons
- Has to be mounted using `fuse`
- If you have to modify it, you need to unsquash/edit/resquash
- If the squashfs is used for conda environments, the mount point has to be same folder as when compressed.
- Issues concurrently mounting squashfs with condor
---
## Future Work
- Introduce a TPO / Daphne deployment stage
- Guard stages with gauntlet integration
- Only move datasets to prod locations after gauntlet phase
- Developer Overrides
- Introduce Jenkins build parameters to override env, script, output folder locations
- Clean up older conda environments (maybe an A/B setup?)
- Replace Jenkins w/Airflow?
---
{"metaMigratedAt":"2023-06-17T05:40:26.941Z","metaMigratedFrom":"YAML","title":"(Actual) Continuous Integration for Backtest","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"bf4db049-911e-493a-ad5c-ae15eadcd73d\",\"add\":8415,\"del\":3019}]"}