Build System in GitHub Actions

# Build System in GitHub Actions On every new commit to a PR or push to master we run a lot of checks to ensure the quality of the code. However, these checks are starting to take a long time to complete. We run 5 checks on every PR commit and these are the times they take to complete: | Check Name | Time | Example | | -------- | -------- | -------- | | lint code base | 2m47s | https://github.com/spiceai/spiceai/actions/runs/1401679314 | | Check for forbidden licenses | 3m4s | https://github.com/spiceai/spiceai/actions/runs/1401679313 | | pr_test | 4m4s | https://github.com/spiceai/spiceai/actions/runs/1401679312 | | e2e_test | 9m50s | https://github.com/spiceai/spiceai/actions/runs/1401679311 | | CodeQL | 9m42s | https://github.com/spiceai/spiceai/actions/runs/1401679310 | The **fastest** check we have takes just under 3 minutes, and the slowest goes up to 10 minutes. This is unacceptable. What makes it worse is that the longest check we have, the `e2e_test` runs several tests in parallel. So if there are multiple commits pushed to the repo in quick succession (even for something as simple as updating release notes) it will cause a large backlog that can take upwards of 30+ minutes to resolve. We can only have [20 concurrent](https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits) jobs running at a single time. ## Goal We want to ensure that every commit that makes it into trunk is well-tested, and we also want to enable fast feedback loops for development. The way we ensure every commit is well tested is by using [required status checks](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/about-protected-branches#require-status-checks-before-merging) for all of our PR checks. This also allows us to use the [Auto-Merge](https://docs.github.com/en/github/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/automatically-merging-a-pull-request) feature safely. The downside of this approach is we cannot use the [paths-ignore](https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#example-ignoring-paths) condition on a GitHub workflow, since the workflow will not run if a path isn't matched. If that happens, the check will never report as successful and GitHub won't allow the PR to be merged since it is missing a required status check. So our current state is that we are naively running all of the PR checks even when there are no changes (i.e. docs/release-notes/version.txt, etc) that would cause a test failure. Even just creating a release branch off of trunk will re-run all of the checks unnecessarily. A good solution would be support from GitHub to make a PR check required only if it is run. Unfortunately, [this is not supported](https://github.community/t/status-checks-required-if-run/117330). Another way to solve for this would be get the list of changed files and make a decision on whether to continue the workflow. (Similar to the `paths-exclude`, but the workflow itself would run first). There is a way to get the [list of changes files](https://github.com/tj-actions/changed-files) but there isn't a way to [exit early from a workflow](https://github.community/t/how-to-perform-an-early-clean-exit-of-a-job/17531). You can decide to run an individual step based on a conditional, but then you would need to add this conditional to every step in the workflow. A final solution that complements the above "changed files" solution would be to use the `actions/cache` action with the commit sha and workflow name as the cache key. If there is a cache hit, it means the workflow was previously run successfully on this exact set of files, so there is no need to re-run the check. It runs into the same limitations as the "changed files" solution as you would need to check in every subsequent step whether this cache hit happened or not. ## Solution The best compromise given all of this context is to: 1. Get the list of changed files for linting, license checking, e2e test and pr_test. 2. Cache the current commit sha using `actions/cache` 3. If a cache hit occurs or it doesn't make sense to continue with the check based on the files changes, set an env var `SKIP_JOB` 4. Add an `if` conditional to every subsequent step that checks for `SKIP_JOB`. ## Final Thoughts As a startup, we need to be thoughtful in how we are spending our resources and energy. The `CodeQL` PR check adds little value compared to the cost (almost 10 minutes) to every PR. I propose we remove this from PR checks and leave it as a scheduled job. This way we can get the benefit of the analysis, but given the low chance of this failing on any individual PR, allows us to be more agile. I also don't think we need all of the e2e test matrix to run on every PR - but I think that is fine to leave for now.