# Section 14: Advanced GitHub Actions Optimization Techniques
# 128. Locking requirement
## **Dependency Locking in CI/CD Pipelines (GitHub Actions)**
### **What Is Dependency Locking?**
- Freezing exact versions of all installed packages (direct + transitive).
- Example: `pip freeze > requirements.txt`
- Tools include:
- `pip freeze` (basic)
- `pip-tools` (advanced, readable)
- `pipenv` (more metadata, graph visualization)
- `poetry` (own lock format, integrated)
---
### **Benefits**
- **Reproducibility**: Everyone installs **exact same versions**.
- **Stability**: Avoid unplanned breaks due to dependency updates.
- **Auditability**: Document what was used and when.
---
### **Trade-offs**
- **Less visibility into new updates** (security, performance).
- **May hide broken dependencies** that customers will encounter.
- **Lock file is Python-version-specific**, so may fail with different interpreters.
---
### **When to Use Locked Dependencies in CI**
- If:
- You **must guarantee stability** (e.g., clinical or financial applications).
- You manage **production services** with strict compatibility needs.
- You're pinning dependencies for a **fixed environment (e.g., Docker)**.
- If not:
- Default to installing latest packages to catch future-breaking changes early.
- Let CI alert you of future compatibility issues.
---
### **How to Use Lock Files in GitHub Actions**
```yaml
steps:
- name: Install pinned requirements
run: pip install -r requirements.txt
- name: Install remaining dependencies (if needed)
run: pip install .
```
- Pip will **reuse** what’s already installed (if versions match).
- If requirements are pinned, this helps avoid re-installation or version drift.
- If using dependency **caching**, this ensures deterministic builds.
---
### **Best Practices**
- Use `pip-tools` to separate `requirements.in` (unpinned) and `requirements.txt` (frozen).
- Include a comment noting Python version used:
```
# Python 3.11.8
```
- Don't pin versions directly in `pyproject.toml` (especially for libraries).
- Use lock files for **applications**, not **libraries**.
# 129. Dependency Caching
### **Why Use Dependency Caching in CI?**
- Each GitHub Actions job runs in a **fresh container** (ephemeral file system).
- This means:
- Dependencies installed with `pip` are **discarded after each run**.
- Builds re-download and re-install packages **from scratch** every time.
- This slows down pipelines and increases **unnecessary network load** on PyPI.
- Caching dependencies reduces:
- **Install time**
- **Bandwidth usage**
- **Build cost (especially for large projects)**
---
### **GitHub's Official Caching Action**
- Action: [`actions/cache`](https://github.com/actions/cache)
- Lets you **cache specific directories or files** across workflow runs.
---
### **Basic Example (Python Pip Cache)**
```yaml
- name: Cache pip dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
```
### Breakdown:
- `path`: Where pip stores its cached files (`~/.cache/pip`)
- `key`: Defines when cache is reused vs invalidated
- Includes OS and a **hash of requirements.txt**
- Changing any line in the file → new hash → new cache
- `restore-keys`: Fallback if exact match isn’t found
### **Where to Place the Cache Step?**
- Add **before** `pip install`, ideally:
```yaml
- uses: actions/setup-python@v4
- name: Cache pip dependencies
...
- name: Install Python dependencies
run: pip install -r requirements.txt
```
---
### **Advanced Use Case: Caching for `pyproject.toml`**
```yaml
key: ${{ runner.os }}-pip-${{ hashFiles('**/pyproject.toml', '**/requirements.txt') }}
```
---
### **Important Notes**
- Caches are automatically expired by GitHub after ~7 days of inactivity.
- Caching does not guarantee faster builds if:
- Upload/download time > install time
- The cache is too frequently invalidated
---
### **Benefits**
- Speeds up iterative development (especially for larger packages)
- Reduces load on PyPI and CI systems
- Encourages modular, testable GitHub Actions workflows
# 130. Parallelization
### Why Parallelization Helps
- Running **all steps sequentially** slows down your workflow:
- You wait for each step to finish before discovering the next failure.
- Example: Tag fails ➝ fix ➝ Lint fails ➝ fix ➝ Tests fail ➝ fix = multiple runs
- **Parallel jobs** allow you to:
- Detect **all errors in a single run**
- **Save time** by running multiple jobs simultaneously
- **Reduce feedback loop time** for developers
---
### 🛠️ Real-World Example
- A large team’s CI pipeline:
- Linting, tagging, testing all done sequentially
- Wasted time re-running workflows for each failed stage
- Could have been faster with **parallelized checks**
---
### GitHub Actions Supports Parallel Jobs by Default
- Each top-level `job:` in a workflow file runs on its **own virtual machine** (VM)
- Unless you explicitly use `needs:`, jobs will **run in parallel**
---
### How to Parallelize
### Split Steps into Separate Jobs:
```yaml
jobs:
build-test-publish:
runs-on: ubuntu-latest
steps:
# build, test, publish logic
check-version:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Check tag
run: git tag some-tag
```
### Notes:
- `fetch-depth: 0` is needed for `git tag` to access **full tag history**
- This `check-version` job runs **independently** of the `build-test-publish` job
- They will run **simultaneously** (in parallel)
---
### Optimization Tip
- Remove `fetch-depth: 0` from `build-test-publish` job if not needed there
- Reduces fetch time
- Makes build job more efficient
---
### Extending This Pattern
You can extract other steps like Linting into their own jobs too:
```yaml
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run linter
run: ./run.sh lint:ci
```
---
### Coordinating Jobs with `needs:`
- Use `needs:` to **define dependencies** if one job must wait for another
```yaml
jobs:
build:
...
test:
needs: build
...
```
---
### Result
- Instead of waiting 3–5 minutes per job sequentially...
- Parallel jobs can give you **feedback in 1–2 minutes total**
- This leads to:
- Fewer iterations
- Faster debugging
- Happier developers
# 131. Passing artifacts (files) between jobs
## **Goals of This Optimization**
- **Split `build` and `publish` jobs** so they can run in parallel and simplify condition logic.
- Use **`upload-artifact` and `download-artifact`** to pass built files between jobs
- Add **caching** to avoid re-installing Python packages and save time.
- Prepare the pipeline for adding **tests** in the next phase.
---
## Step-by-Step Improvements
### 1. **Split Jobs**
- Split `build-test-publish` into:
- `build-wheel-and-sdist`
- `publish-to-pypi`
- Advantage:
- Each job can run in **paralle**
- Cleaner conditional logic using `if:` on the entire job
### 2. **Upload Artifacts**
- After building in the `build` job:
```yaml
- name: Upload package artifacts
uses: actions/upload-artifact@v3
with:
name: wheel-and-sdist
path: ./dist/
```
### 3. **Download Artifacts**
- In the `publish` job:
```yaml
- name: Download wheel and sdist
uses: actions/download-artifact@v3
with:
name: wheel-and-sdist
path: ./dist/
```
### 4. **Control Publish Logic with `needs:`**
```yaml
publish:
needs:
- build
- lint
- check-version
```
---
## Caching Python Dependencies
### Use Official GitHub `actions/cache`:
Add a step like this before installing:
```yaml
- name: Cache pip dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('pyproject.toml') }}
```
### Cache Strategy Summary:
| Job | Cache Key | When to Invalidate? |
| --- | --- | --- |
| `lint` | Hash of `.pre-commit-config.yaml` | On change to pre-commit config |
| `build` | Hardcoded `"build"` | Rarely changes, fine for one pkg |
| `publish` | Hardcoded `"twine"` | Similar to `build` |
---
## Key Benefits Achieved
- **Modular jobs**: Easier debugging, faster runs
- **Artifact sharing**: Keep build and publish logic separate
- **Parallel execution**: Faster feedback cycle for developers
- **Smarter caching**: Save time across runs
- **Safe publishing**: Version checks, tagging, and release control
---
## What's Missing?
- **No tests yet**! Even though we’ve built and published packages, we haven’t actually run or validated the code. This will be addressed in the **next section**.
---
## Final Thoughts
If you’ve followed this all the way:
- You now know **95% of practical GitHub Actions workflows**
- You understand:
- Workflow structure
- Parallelization
- Artifact sharing
- Conditional logic
- Caching
- Deployment