Section 14: Advanced GitHub Actions Optimization Techniques

# Section 14: Advanced GitHub Actions Optimization Techniques # 128. Locking requirement ## **Dependency Locking in CI/CD Pipelines (GitHub Actions)** ### **What Is Dependency Locking?** - Freezing exact versions of all installed packages (direct + transitive). - Example: `pip freeze > requirements.txt` - Tools include: - `pip freeze` (basic) - `pip-tools` (advanced, readable) - `pipenv` (more metadata, graph visualization) - `poetry` (own lock format, integrated) --- ### **Benefits** - **Reproducibility**: Everyone installs **exact same versions**. - **Stability**: Avoid unplanned breaks due to dependency updates. - **Auditability**: Document what was used and when. --- ### **Trade-offs** - **Less visibility into new updates** (security, performance). - **May hide broken dependencies** that customers will encounter. - **Lock file is Python-version-specific**, so may fail with different interpreters. --- ### **When to Use Locked Dependencies in CI** - If: - You **must guarantee stability** (e.g., clinical or financial applications). - You manage **production services** with strict compatibility needs. - You're pinning dependencies for a **fixed environment (e.g., Docker)**. - If not: - Default to installing latest packages to catch future-breaking changes early. - Let CI alert you of future compatibility issues. --- ### **How to Use Lock Files in GitHub Actions** ```yaml steps: - name: Install pinned requirements run: pip install -r requirements.txt - name: Install remaining dependencies (if needed) run: pip install . ``` - Pip will **reuse** what’s already installed (if versions match). - If requirements are pinned, this helps avoid re-installation or version drift. - If using dependency **caching**, this ensures deterministic builds. --- ### **Best Practices** - Use `pip-tools` to separate `requirements.in` (unpinned) and `requirements.txt` (frozen). - Include a comment noting Python version used: ``` # Python 3.11.8 ``` - Don't pin versions directly in `pyproject.toml` (especially for libraries). - Use lock files for **applications**, not **libraries**. # 129. Dependency Caching ### **Why Use Dependency Caching in CI?** - Each GitHub Actions job runs in a **fresh container** (ephemeral file system). - This means: - Dependencies installed with `pip` are **discarded after each run**. - Builds re-download and re-install packages **from scratch** every time. - This slows down pipelines and increases **unnecessary network load** on PyPI. - Caching dependencies reduces: - **Install time** - **Bandwidth usage** - **Build cost (especially for large projects)** --- ### **GitHub's Official Caching Action** - Action: [`actions/cache`](https://github.com/actions/cache) - Lets you **cache specific directories or files** across workflow runs. --- ### **Basic Example (Python Pip Cache)** ```yaml - name: Cache pip dependencies uses: actions/cache@v3 with: path: ~/.cache/pip key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} restore-keys: | ${{ runner.os }}-pip- ``` ### Breakdown: - `path`: Where pip stores its cached files (`~/.cache/pip`) - `key`: Defines when cache is reused vs invalidated - Includes OS and a **hash of requirements.txt** - Changing any line in the file → new hash → new cache - `restore-keys`: Fallback if exact match isn’t found ### **Where to Place the Cache Step?** - Add **before** `pip install`, ideally: ```yaml - uses: actions/setup-python@v4 - name: Cache pip dependencies ... - name: Install Python dependencies run: pip install -r requirements.txt ``` --- ### **Advanced Use Case: Caching for `pyproject.toml`** ```yaml key: ${{ runner.os }}-pip-${{ hashFiles('**/pyproject.toml', '**/requirements.txt') }} ``` --- ### **Important Notes** - Caches are automatically expired by GitHub after ~7 days of inactivity. - Caching does not guarantee faster builds if: - Upload/download time > install time - The cache is too frequently invalidated --- ### **Benefits** - Speeds up iterative development (especially for larger packages) - Reduces load on PyPI and CI systems - Encourages modular, testable GitHub Actions workflows # 130. Parallelization ### Why Parallelization Helps - Running **all steps sequentially** slows down your workflow: - You wait for each step to finish before discovering the next failure. - Example: Tag fails ➝ fix ➝ Lint fails ➝ fix ➝ Tests fail ➝ fix = multiple runs - **Parallel jobs** allow you to: - Detect **all errors in a single run** - **Save time** by running multiple jobs simultaneously - **Reduce feedback loop time** for developers --- ### 🛠️ Real-World Example - A large team’s CI pipeline: - Linting, tagging, testing all done sequentially - Wasted time re-running workflows for each failed stage - Could have been faster with **parallelized checks** --- ### GitHub Actions Supports Parallel Jobs by Default - Each top-level `job:` in a workflow file runs on its **own virtual machine** (VM) - Unless you explicitly use `needs:`, jobs will **run in parallel** --- ### How to Parallelize ### Split Steps into Separate Jobs: ```yaml jobs: build-test-publish: runs-on: ubuntu-latest steps: # build, test, publish logic check-version: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 with: fetch-depth: 0 - name: Check tag run: git tag some-tag ``` ### Notes: - `fetch-depth: 0` is needed for `git tag` to access **full tag history** - This `check-version` job runs **independently** of the `build-test-publish` job - They will run **simultaneously** (in parallel) --- ### Optimization Tip - Remove `fetch-depth: 0` from `build-test-publish` job if not needed there - Reduces fetch time - Makes build job more efficient --- ### Extending This Pattern You can extract other steps like Linting into their own jobs too: ```yaml jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Install dependencies run: pip install -r requirements.txt - name: Run linter run: ./run.sh lint:ci ``` --- ### Coordinating Jobs with `needs:` - Use `needs:` to **define dependencies** if one job must wait for another ```yaml jobs: build: ... test: needs: build ... ``` --- ### Result - Instead of waiting 3–5 minutes per job sequentially... - Parallel jobs can give you **feedback in 1–2 minutes total** - This leads to: - Fewer iterations - Faster debugging - Happier developers # 131. Passing artifacts (files) between jobs ## **Goals of This Optimization** - **Split `build` and `publish` jobs** so they can run in parallel and simplify condition logic. - Use **`upload-artifact` and `download-artifact`** to pass built files between jobs - Add **caching** to avoid re-installing Python packages and save time. - Prepare the pipeline for adding **tests** in the next phase. --- ## Step-by-Step Improvements ### 1. **Split Jobs** - Split `build-test-publish` into: - `build-wheel-and-sdist` - `publish-to-pypi` - Advantage: - Each job can run in **paralle** - Cleaner conditional logic using `if:` on the entire job ### 2. **Upload Artifacts** - After building in the `build` job: ```yaml - name: Upload package artifacts uses: actions/upload-artifact@v3 with: name: wheel-and-sdist path: ./dist/ ``` ### 3. **Download Artifacts** - In the `publish` job: ```yaml - name: Download wheel and sdist uses: actions/download-artifact@v3 with: name: wheel-and-sdist path: ./dist/ ``` ### 4. **Control Publish Logic with `needs:`** ```yaml publish: needs: - build - lint - check-version ``` --- ## Caching Python Dependencies ### Use Official GitHub `actions/cache`: Add a step like this before installing: ```yaml - name: Cache pip dependencies uses: actions/cache@v3 with: path: ~/.cache/pip key: ${{ runner.os }}-pip-${{ hashFiles('pyproject.toml') }} ``` ### Cache Strategy Summary: | Job | Cache Key | When to Invalidate? | | --- | --- | --- | | `lint` | Hash of `.pre-commit-config.yaml` | On change to pre-commit config | | `build` | Hardcoded `"build"` | Rarely changes, fine for one pkg | | `publish` | Hardcoded `"twine"` | Similar to `build` | --- ## Key Benefits Achieved - **Modular jobs**: Easier debugging, faster runs - **Artifact sharing**: Keep build and publish logic separate - **Parallel execution**: Faster feedback cycle for developers - **Smarter caching**: Save time across runs - **Safe publishing**: Version checks, tagging, and release control --- ## What's Missing? - **No tests yet**! Even though we’ve built and published packages, we haven’t actually run or validated the code. This will be addressed in the **next section**. --- ## Final Thoughts If you’ve followed this all the way: - You now know **95% of practical GitHub Actions workflows** - You understand: - Workflow structure - Parallelization - Artifact sharing - Conditional logic - Caching - Deployment