# NCHC Project: eFlows4HPC PoC
This document describes the steps for setting up live instances of eFlows4HPC services.
**Warning: After this PoC, I have concluded that eFlows4HPC is a premature (i.e., alpha quality at best) project. Documentation is severely lacking and some functionalities do not work at all. Do not use in production.**
## Preparation
* Prepare 3 virtual machines (4 vCPU and 32 GB memory for each are more than enough; Ubuntu 22.04 LTS is used here):
* Hostname `eflows4hpc0ap` (Application server)
* Hostname `eflows4hpc0cn1` (Compute node 1, Alien 4 Cloud frontend)
* Hostname `eflows4hpc0cn2` (Compute node 2)
* Add entries to `/etc/hosts` for all of them
```=
192.168.211.24 eflows4hpc0ap
192.168.211.31 eflows4hpc0cn1
192.168.211.25 eflows4hpc0cn2
```
* Perform system upgrade for all of them
```bash
sudo apt update
sudo apt upgrade
```
## For `eflows4hpc0cn1`, `eflows4hpc0cn2`
* Allow deprecated `ssh-rsa` signature algorithm
This is for Ystia Orchestrator because it still uses too old version of Paramiko SSH library. Deployment will fail early if `ssh-rsa` is not allowed.
```bash
echo "PubkeyAcceptedAlgorithms=+ssh-rsa" | sudo tee /etc/ssh/sshd_config.d/allow-ssh-rsa.conf
sudo systemctl restart sshd
```
## For `eflows4hpc0ap`
### Install Docker
* Install Docker from its official `apt` repository
* Configure the repository
```bash
sudo apt install ca-certificates curl gnupg lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
```
* Install Docker
```bash
sudo apt update
sudo apt install dbus-user-session fuse-overlayfs slirp4netns uidmap
sudo apt install docker-ce docker-ce-cli docker-ce-rootless-extras containerd.io docker-compose-plugin
```
* Enable and start Docker services
```bash
sudo systemctl enable --now containerd.service
sudo systemctl enable --now docker.service
```
* Configure Docker in **rootless** mode for better security
```bash
dockerd-rootless-setuptool.sh install
# Allow binding privileged ports
sudo setcap cap_net_bind_service=ep $(which rootlesskit)
systemctl --user restart docker
# Allow automatic startup
sudo loginctl enable-linger $(whoami)
```
### Set Up eFlows4HPC Services
* Install necessary python packages
```bash
sudo apt install python3-dev python3-pip python3-venv
```
* Pull necessary container images
```bash
docker pull httpd:latest
docker pull registry.jsc.fz-juelich.de/eflows4hpc-wp2/datacatalog:stable-0.32
docker pull registry.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/eflows-airflow:latest
```
* `git clone` necessary code repositories
```bash
git clone https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/datacatalog.git
git clone https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service.git
```
#### Data Catalog
* Create python virtual environment `dc-env` and install prerequisites inside
```bash
python3 -m venv dc-env
source dc-env/bin/activate
python3 -m pip install passlib pydantic wheel
python3 -m pip install -r requirements.txt
```
* Apply patch
Correctly configure "Cross-Origin Resource Sharing (CORS)" for backend (API server) or frontend (static website) will simply not work due to errors upon user login.
```diff
diff --git a/apiserver/main.py b/apiserver/main.py
index 6a9f227..e947a49 100644
--- a/apiserver/main.py
+++ b/apiserver/main.py
@@ -47,14 +47,7 @@ app = FastAPI(
app.add_middleware(SessionMiddleware, secret_key="secret-string") # TODO generate secret string during startup
origins = [
- "https://datacatalog.fz-juelich.de",
- "https://datacatalogue.eflows4hpc.eu",
- "https://zam10059.zam.kfa-juelich.de",
- "https://zam10036.zam.kfa-juelich.de",
- "http://datacatalog.fz-juelich.de",
- "http://datacatalogue.eflows4hpc.eu",
- "http://zam10059.zam.kfa-juelich.de",
- "http://zam10036.zam.kfa-juelich.de"
+ "http://203.145.218.64:8001"
]
app.add_middleware(CORSMiddleware,
diff --git a/frontend/templates/index_content.html.jinja b/frontend/templates/index_content.html.jinja
index b833645..12238dd 100644
--- a/frontend/templates/index_content.html.jinja
+++ b/frontend/templates/index_content.html.jinja
@@ -11,7 +11,7 @@
<h5>For more information about the eFlows4HPC project, please see the <a href="https://eflows4hpc.eu/">project website.</a></h5>
<div><img class="img-fluid" src="img/Colorweb.png" alt="eFlows4HPC Logo" /></div>
<h2>API Documentation</h2>
- <p>The backend-API of the Datacatalog is compatible with the <a href="https://swagger.io/specification/">openAPI</a> specification. The API-documentation for the Datacatalog is available in the <a href="{% raw %}{{API_URL}}{% endraw %}openapi.json">openapi.json</a> file. <br />A nicer view of the documentation, which includes examples requests for every API-function, is available <a href="docs">here</a>.</p>
+ <p>The backend-API of the Datacatalog is compatible with the <a href="https://swagger.io/specification/">openAPI</a> specification. The API-documentation for the Datacatalog is available in the <a href="{% raw %}{{API_URL}}{% endraw %}openapi.json">openapi.json</a> file. <br />A nicer view of the documentation, which includes examples requests for every API-function, is available <a href="{% raw %}{{API_URL}}{% endraw %}docs">here</a>.</p>
<p>For readonly acces, please read the following:</p>
<p>Each dataset is identified by its <code>type</code> and <code>oid</code>. To access and view the dataset in your browser via the frontend, navigate to <code>./storage.html?type=DATASETTYPE&oid=DATASET_OID</code>. The response will be a html document that will display the dataset. <br />For access via the API, navigate to <code>{% raw %}{{API_URL}}{% endraw %}DATASETTYPE/DATASET_OID</code>. The response will be a json document that contains all information about the dataset.</p>
<p>It is also possible to list all datasets of a specific type. To do this via your browser, navigate to <code>./storage.html?type=DATASETTYPE</code>. The response will be a html document that will display the list of datasets. <br />For access via the API, navigate to <code>{% raw %}{{API_URL}}{% endraw %}DATASETTYPE</code>. The response will be a json document that contains all datasets of the specified type.</p>
```
* Create admin account `dcuser` and generate user database
```bash
python3 userdb-cli.py -u dcuser -m dcuser@mail.addr -p [REDACTED] -s add userdb.json
```
* Generate static website
```bash
python3 frontend/createStatic.py --api-url "http://203.145.218.64:8000/"
```
* Deactivate python virtual environment
```bash
deactivate
```
* Start backend (API server) service
```bash
docker run -d --name data-catalog-apiserver --restart unless-stopped -p 8000:8000 -v /home/ubuntu/datacatalog/apiserver/main.py:/home/apiserver/apiserver/main.py -v /home/ubuntu/datacatalog/userdb.json:/home/apiserver/mnt/userdb.json registry.jsc.fz-juelich.de/eflows4hpc-wp2/datacatalog:stable-0.32
```
* Start frontend (static website) service
```bash
docker run -d --name data-catalog-frontend --restart unless-stopped -p 8001:80 -v /home/ubuntu/datacatalog/site:/usr/local/apache2/htdocs httpd:latest
```
#### Data Logistics Service
* Apply patch
Fix paths and correctly configure reverse proxy.
```diff
diff --git a/dockers/docker-compose.yaml b/dockers/docker-compose.yaml
index 0204d07..fa33df7 100644
--- a/dockers/docker-compose.yaml
+++ b/dockers/docker-compose.yaml
@@ -62,7 +62,7 @@ x-airflow-common:
volumes:
- ./dags:/opt/airflow/dags
- ./config/airflow.cfg:/opt/airflow/airflow.cfg
- - /persistent_data/logs:/opt/airflow/logs
+ - ./persistent_data/logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
@@ -82,28 +82,12 @@ services:
- "dhparam:/etc/nginx/dhparam"
- "vhost:/etc/nginx/vhost.d"
- "certs:/etc/nginx/certs"
- - "/run/docker.sock:/tmp/docker.sock:ro"
+ - "$XDG_RUNTIME_DIR/docker.sock:/tmp/docker.sock:ro"
restart: "always"
ports:
- "80:80"
- "443:443"
- letsencrypt:
- image: "jrcs/letsencrypt-nginx-proxy-companion:latest"
- container_name: "letsencrypt-helper"
- volumes:
- - "html:/usr/share/nginx/html"
- - "dhparam:/etc/nginx/dhparam"
- - "vhost:/etc/nginx/vhost.d"
- - "certs:/etc/nginx/certs"
- - "/run/docker.sock:/var/run/docker.sock:ro"
- environment:
- NGINX_PROXY_CONTAINER: "reverse-proxy"
- DEFAULT_EMAIL: "m.petrova@fz-juelich.de"
- restart: "always"
- depends_on:
- - "reverse-proxy"
-
postgres:
image: postgres:13
environment:
@@ -137,8 +121,7 @@ services:
environment:
<<: *airflow-common-env
- VIRTUAL_HOST: datalogistics.eflows4hpc.eu
- LETSENCRYPT_HOST: datalogistics.eflows4hpc.eu
+ VIRTUAL_HOST: 203.145.218.64
VIRTUAL_PORT: 8080
healthcheck:
@@ -185,7 +168,7 @@ services:
volumes:
- ./dags:/opt/airflow/dags
- ./config/airflow.cfg:/opt/airflow/airflow.cfg
- - /persistent_data/logs:/opt/airflow/logs
+ - ./persistent_data/logs:/opt/airflow/logs
- ./tmp/:/work/
depends_on:
<<: *airflow-common-depends-on
```
* Create data volumes and directories in advance
```bash
docker volume create persistent_postgres-db-volume
docker volume create persistent_certs
mkdir -p persistent_data/logs/dag_processor_manager
mkdir -p persistent_data/logs/scheduler
```
* Start service
```bash
docker compose -f dockers/docker-compose.yaml --project-directory . up airflow-init
docker compose -f dockers/docker-compose.yaml --project-directory . up -d
```
* Create admin account `dlsuser` inside `data-logistics-service-airflow-webserver-1` container
```bash
docker exec -it data-logistics-service-airflow-webserver-1 bash
airflow users create -e dlsuser@mail.addr -f User -l DLS -p [REDACTED] -r Admin -u dlsuser
exit
```
#### Alien 4 Cloud & Ystia Orchestrator
* Download Ystia Orchestrator
```bash
wget https://github.com/ystia/yorc/releases/download/v4.3.0/yorc-4.3.0.tgz
tar -xf yorc-4.3.0.tgz
```
* Create python virtual environment `yorc-env` and let `yorc` install prerequisites inside
```bash
python3 -m venv yorc-env
source yorc-env/bin/activate
python3 -m pip install wheel
```
* Prepare configuration file `deploy.yaml` for deployment
Fill in `password`, `private_key_content`, `private_key_file`, `ca_passphrase`, etc. and adapt others as appropriate.
```=
alien4cloud:
download_url: https://www.portaildulibre.fr/nexus/repository/opensource-releases/alien4cloud/alien4cloud-premium-dist/3.7.0/alien4cloud-premium-dist-3.7.0-dist.tar.gz
port: 8088
protocol: http
user: a4cuser
password: [REDACTED]
extra_env: ""
yorcplugin:
download_url: ""
consul:
download_url: https://releases.hashicorp.com/consul/1.11.11/consul_1.11.11_linux_amd64.zip
port: 8500
tls_enabled: false
tls_for_checks_enabled: false
encrypt_key: ""
terraform:
download_url: https://releases.hashicorp.com/terraform/0.11.15/terraform_0.11.15_linux_amd64.zip
plugins_download_urls:
- https://releases.hashicorp.com/terraform-provider-null/1.0.0/terraform-provider-null_1.0.0_linux_amd64.zip
- https://releases.hashicorp.com/terraform-provider-aws/1.36.0/terraform-provider-aws_1.36.0_linux_amd64.zip
- https://releases.hashicorp.com/terraform-provider-consul/2.1.0/terraform-provider-consul_2.1.0_linux_amd64.zip
- https://releases.hashicorp.com/terraform-provider-google/1.18.0/terraform-provider-google_1.18.0_linux_amd64.zip
- https://releases.hashicorp.com/terraform-provider-openstack/1.32.0/terraform-provider-openstack_1.32.0_linux_amd64.zip
yorc:
download_url: https://github.com/ystia/yorc/releases/download/v4.3.0/yorc-4.3.0.tgz
port: 8800
protocol: http
private_key_content: |
-----BEGIN RSA PRIVATE KEY-----
[REDACTED]
-----END RSA PRIVATE KEY-----
private_key_file: [REDACTED]
ca_pem: ""
ca_pem_file: ""
ca_key: ""
ca_key_file: ""
ca_passphrase: [REDACTED]
data_dir: /var/yorc
workers_number: 30
resources_prefix: yorc-
locations: []
compute:
shareable: true
address: {}
location:
type: HostsPool
name: Host-Pool
resourcesfile: resources/ondemand_resources_hostspool.yaml
properties: {}
hosts:
- name: eflows4hpc0cn1
connection:
user: ubuntu
private_key: /var/yorc/.ssh/yorc.pem
host: eflows4hpc0cn1
port: 22
labels:
host.cpu_frequency: "2.5 GHz"
host.disk_size: "50 GB"
host.mem_size: "32 GB"
host.num_cpus: 4
os.architecture: "x86_64"
os.distribution: "ubuntu"
os.type: "linux"
os.version: "22.04"
private_address: 192.168.211.31
public_address: 192.168.211.31
- name: eflows4hpc0cn2
connection:
user: ubuntu
private_key: /var/yorc/.ssh/yorc.pem
host: eflows4hpc0cn2
port: 22
labels:
host.cpu_frequency: "2.5 GHz"
host.disk_size: "50 GB"
host.mem_size: "16 GB"
host.num_cpus: 2
os.architecture: "x86_64"
os.distribution: "ubuntu"
os.type: "linux"
os.version: "22.04"
private_address: 192.168.211.25
public_address: 192.168.211.25
vault:
download_url: https://releases.hashicorp.com/vault/1.0.3/vault_1.0.3_linux_amd64.zip
port: 8200
insecure: true
```
* Deploy Alien 4 Cloud and Ystia Orchestrator together
```bash
./yorc bootstrap --insecure --values deploy.yaml
```
* Deactivate python virtual environment
```bash
deactivate
```