# Apache Airflow Repository in Subrepo
## Why Subrepo?
We are using Subrepo to be able to both - pull changes from Apache Airflow
Helm Chart and to contribute the changes back to the upstream.
This is for changes to both: Airflow Helm Chart and to Airflow itself. Those
two are in a single repository which we synchronize via Subrepo.
Working with Git Subrepo (unlike using submodules) does not require to
change your regular workflow. Once a particular version of subrepo is added
(usually as a single squash commit) you just continue to work with your
own repository as usual (committing the changes to your original repo).
Subrepo stores locally the version of the repository you pulled-in, and
it lets you then synchronize your changes back-forth between the code
merged as Subrepo, and the upstream repository from which you created
the subrepo.
You only need to use subrepo commands when:
* you want to pull the latest changes from the linked repository
* you want to contribute the changes back to the linked repository
More information about subrepo can be found in:
* GitHub home of subrepo: https://github.com/ingydotnet/git-subrepo
* Simple and nice subrepo tutorial:
http://blog.s-schoener.com/2019-04-20-git-subrepo/
* Wiki of subrepo project with basic information:
https://github.com/ingydotnet/git-subrepo/wiki/Basics
## Our Subrepo configuration
In our case the Helm chart is a subdirectory of master Airflow, and
we are pulling the whole Apache Airflow project (in `subrepos/airflow`).
In order to have the Helm chart in the right place in our structure,
we create a symbolic link to the `chart` directory of airflow project
in `terraform/modules/airflow_tenant/chart`.
We keep the `chart` name rather than usual `helm`, following the
`chart` name of the folder in the Apache Airflow repository.
## Workflow examples when interacting with the upstream Airflow repo
### Pulling latest changes from Apache Airflow
The command below tries to pull in the latest changes from the Apache
Airflow upstream repository and merge it with the changes introduced
in our repository. If there are no conflicts, it will simply commit
the resulting change as a single commit, recording the history
of merged changes so that subsequent pulls will incrementally
pull only the new changes in the future.
```bash
git subrepo pull subrepos/airflow
```
*NOTE* Make sure you make PR right after you run `subrepo pull` and do not try to
merge it together with other commits. Subrepo stores hash of the parent commit in
.gitrepo file and if you happen to rebase/squash that parent commit it will disappear
from the history and subrepo will not find from where it should start.
Here is an actual problem that happened to us:
```
NNNNNNNNN git subrepo pull subrepos/airflow
8c00cd85d fixup! Add explanation about linking airflow helm repository
4ca8884bb fixup! Add explanation about linking airflow helm repository
7673e12cf fixup! Add explanation about linking airflow helm repository
45db18c98 Add chart as symbolic link to airflow's chart from subrepo
0934cd2ff git subrepo clone git@github.com:apache/airflow.git subrepos/airflow
```
After rebase the history became:
```
26a2389c3 git subrepo pull subrepos/airflow
2fb7b61ad Add chart as symbolic link to airflow's chart from subrepo
3d10e688c git subrepo clone git@github.com:apache/airflow.git subrepos/airflow
```
Then subrepo could not find the parent commit 8c00cd85d at next subrepo pull run.
### Pushing changes back to fork of upstream repo
The command below extracts the changes that are only added locally
and not yet present in the Apache Airflow repository. It will
push `<BRANCH_NAME>` to your own fork of the Apache Airflow, and
you will be able to prepare a PR from that branch. Later
the same change can be synced back. This will be a no-op if PR
is merged without any change or via conflict resolution if the PR
has been merged with some changes.
This is done with the subsequent `pull` command, after the PR
gets merged to the main Apache Airflow.
```bash
git remote add airflow-fork git@github.com:<YOUR_USER>/airflow.git
git subrepo push subrepos/airflow -r airflow-fork -b <BRANCH_NAME>
```
### The subrepo flow
The subrepo flow here:

----
## Diagrams
We are using Mermaid to generate our diagrams from text sources.
More info at https://mermaidjs.github.io/#/.
In order to get mermaid cli you need to install development packages with yarn:
1. Run `yarn install --dev`
You can generate any of the mermaid diagrams via:
```bash
file=resources/<FILENAME>.mermaid
node_modules/.bin/mmdc" \
-i "${file}" \
-o "$(dirname "${file}")/$(basename "${file}" .mermaid).png" \
-c "resources/mermaid-config.json"
```
You can generate all files added to got staging via:
```bash
pre-commit run mermaid
```
You can generate all changed diagrams via
```bash
pre-commit run mermaid --all-files
```
----
resources/subrepo-flow.mermaid
```
graph TD
subgraph Subrepo workflow
A[Airflow repo] -->|" subrepo pull "| B(Local Clone)
B --> |" git push " | C[pso-google-sfdc-airflow-aas repository]
C --> |" git pull " | B
B --> |" subrepo push " | D[Airflow Fork]
D --> |" Merge PR "| A
style A fill:#69758f
style A color:#ffffff
style D fill:#69758f
style D color:#ffffff
style C fill:#69758f
style C color:#ffffff
end
```
---
resources/mermaid-config.json
```
{
"theme": "default",
"themeCSS": ".label foreignObject { overflow: visible; }"
}
```
---
scripts/mermaid.sh
```
#!/usr/bin/env bash
# Copyright 2020 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
#
# This software is provided as-is,
# without warranty or representation for any use or purpose.
# Your use of it is subject to your agreement with Google.
set -eao pipefail
PROJECT_SOURCES="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && cd .. && pwd )"
for file in "${@}"
do
basename_file="$(dirname "${file}")/$(basename "${file}" .mermaid)"
md5sum_file="${basename_file}.md5"
if ! diff "${md5sum_file}" <(md5sum "${file}"); then
md5sum "${file}" >"${md5sum_file}"
echo "Running generation for ${file}"
"${NODE_VIRTUAL_ENV}/bin/mmdc" \
-i "${file}" \
-o "$(dirname "${file}")/$(basename "${file}" .mermaid).png" \
-c "${PROJECT_SOURCES}/resources/mermaid-config.json"
fi
done
```
---
.pre-commit.yaml
```
- id: mermaid
name: Generate images
entry: ./scripts/mermaid.sh
additional_dependencies: ['mermaid.cli']
language: node
exclude: ^subrepos
```