# NCHC Project: eFlows4HPC PoC This document describes the steps for setting up live instances of eFlows4HPC services. **Warning: After this PoC, I have concluded that eFlows4HPC is a premature (i.e., alpha quality at best) project. Documentation is severely lacking and some functionalities do not work at all. Do not use in production.** ## Preparation * Prepare 3 virtual machines (4 vCPU and 32 GB memory for each are more than enough; Ubuntu 22.04 LTS is used here): * Hostname `eflows4hpc0ap` (Application server) * Hostname `eflows4hpc0cn1` (Compute node 1, Alien 4 Cloud frontend) * Hostname `eflows4hpc0cn2` (Compute node 2) * Add entries to `/etc/hosts` for all of them ```= 192.168.211.24 eflows4hpc0ap 192.168.211.31 eflows4hpc0cn1 192.168.211.25 eflows4hpc0cn2 ``` * Perform system upgrade for all of them ```bash sudo apt update sudo apt upgrade ``` ## For `eflows4hpc0cn1`, `eflows4hpc0cn2` * Allow deprecated `ssh-rsa` signature algorithm This is for Ystia Orchestrator because it still uses too old version of Paramiko SSH library. Deployment will fail early if `ssh-rsa` is not allowed. ```bash echo "PubkeyAcceptedAlgorithms=+ssh-rsa" | sudo tee /etc/ssh/sshd_config.d/allow-ssh-rsa.conf sudo systemctl restart sshd ``` ## For `eflows4hpc0ap` ### Install Docker * Install Docker from its official `apt` repository * Configure the repository ```bash sudo apt install ca-certificates curl gnupg lsb-release sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null ``` * Install Docker ```bash sudo apt update sudo apt install dbus-user-session fuse-overlayfs slirp4netns uidmap sudo apt install docker-ce docker-ce-cli docker-ce-rootless-extras containerd.io docker-compose-plugin ``` * Enable and start Docker services ```bash sudo systemctl enable --now containerd.service sudo systemctl enable --now docker.service ``` * Configure Docker in **rootless** mode for better security ```bash dockerd-rootless-setuptool.sh install # Allow binding privileged ports sudo setcap cap_net_bind_service=ep $(which rootlesskit) systemctl --user restart docker # Allow automatic startup sudo loginctl enable-linger $(whoami) ``` ### Set Up eFlows4HPC Services * Install necessary python packages ```bash sudo apt install python3-dev python3-pip python3-venv ``` * Pull necessary container images ```bash docker pull httpd:latest docker pull registry.jsc.fz-juelich.de/eflows4hpc-wp2/datacatalog:stable-0.32 docker pull registry.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/eflows-airflow:latest ``` * `git clone` necessary code repositories ```bash git clone https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/datacatalog.git git clone https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service.git ``` #### Data Catalog * Create python virtual environment `dc-env` and install prerequisites inside ```bash python3 -m venv dc-env source dc-env/bin/activate python3 -m pip install passlib pydantic wheel python3 -m pip install -r requirements.txt ``` * Apply patch Correctly configure "Cross-Origin Resource Sharing (CORS)" for backend (API server) or frontend (static website) will simply not work due to errors upon user login. ```diff diff --git a/apiserver/main.py b/apiserver/main.py index 6a9f227..e947a49 100644 --- a/apiserver/main.py +++ b/apiserver/main.py @@ -47,14 +47,7 @@ app = FastAPI( app.add_middleware(SessionMiddleware, secret_key="secret-string") # TODO generate secret string during startup origins = [ - "https://datacatalog.fz-juelich.de", - "https://datacatalogue.eflows4hpc.eu", - "https://zam10059.zam.kfa-juelich.de", - "https://zam10036.zam.kfa-juelich.de", - "http://datacatalog.fz-juelich.de", - "http://datacatalogue.eflows4hpc.eu", - "http://zam10059.zam.kfa-juelich.de", - "http://zam10036.zam.kfa-juelich.de" + "http://203.145.218.64:8001" ] app.add_middleware(CORSMiddleware, diff --git a/frontend/templates/index_content.html.jinja b/frontend/templates/index_content.html.jinja index b833645..12238dd 100644 --- a/frontend/templates/index_content.html.jinja +++ b/frontend/templates/index_content.html.jinja @@ -11,7 +11,7 @@ <h5>For more information about the eFlows4HPC project, please see the <a href="https://eflows4hpc.eu/">project website.</a></h5> <div><img class="img-fluid" src="img/Colorweb.png" alt="eFlows4HPC Logo" /></div> <h2>API Documentation</h2> - <p>The backend-API of the Datacatalog is compatible with the <a href="https://swagger.io/specification/">openAPI</a> specification. The API-documentation for the Datacatalog is available in the <a href="{% raw %}{{API_URL}}{% endraw %}openapi.json">openapi.json</a> file. <br />A nicer view of the documentation, which includes examples requests for every API-function, is available <a href="docs">here</a>.</p> + <p>The backend-API of the Datacatalog is compatible with the <a href="https://swagger.io/specification/">openAPI</a> specification. The API-documentation for the Datacatalog is available in the <a href="{% raw %}{{API_URL}}{% endraw %}openapi.json">openapi.json</a> file. <br />A nicer view of the documentation, which includes examples requests for every API-function, is available <a href="{% raw %}{{API_URL}}{% endraw %}docs">here</a>.</p> <p>For readonly acces, please read the following:</p> <p>Each dataset is identified by its <code>type</code> and <code>oid</code>. To access and view the dataset in your browser via the frontend, navigate to <code>./storage.html?type=DATASETTYPE&amp;oid=DATASET_OID</code>. The response will be a html document that will display the dataset. <br />For access via the API, navigate to <code>{% raw %}{{API_URL}}{% endraw %}DATASETTYPE/DATASET_OID</code>. The response will be a json document that contains all information about the dataset.</p> <p>It is also possible to list all datasets of a specific type. To do this via your browser, navigate to <code>./storage.html?type=DATASETTYPE</code>. The response will be a html document that will display the list of datasets. <br />For access via the API, navigate to <code>{% raw %}{{API_URL}}{% endraw %}DATASETTYPE</code>. The response will be a json document that contains all datasets of the specified type.</p> ``` * Create admin account `dcuser` and generate user database ```bash python3 userdb-cli.py -u dcuser -m dcuser@mail.addr -p [REDACTED] -s add userdb.json ``` * Generate static website ```bash python3 frontend/createStatic.py --api-url "http://203.145.218.64:8000/" ``` * Deactivate python virtual environment ```bash deactivate ``` * Start backend (API server) service ```bash docker run -d --name data-catalog-apiserver --restart unless-stopped -p 8000:8000 -v /home/ubuntu/datacatalog/apiserver/main.py:/home/apiserver/apiserver/main.py -v /home/ubuntu/datacatalog/userdb.json:/home/apiserver/mnt/userdb.json registry.jsc.fz-juelich.de/eflows4hpc-wp2/datacatalog:stable-0.32 ``` * Start frontend (static website) service ```bash docker run -d --name data-catalog-frontend --restart unless-stopped -p 8001:80 -v /home/ubuntu/datacatalog/site:/usr/local/apache2/htdocs httpd:latest ``` #### Data Logistics Service * Apply patch Fix paths and correctly configure reverse proxy. ```diff diff --git a/dockers/docker-compose.yaml b/dockers/docker-compose.yaml index 0204d07..fa33df7 100644 --- a/dockers/docker-compose.yaml +++ b/dockers/docker-compose.yaml @@ -62,7 +62,7 @@ x-airflow-common: volumes: - ./dags:/opt/airflow/dags - ./config/airflow.cfg:/opt/airflow/airflow.cfg - - /persistent_data/logs:/opt/airflow/logs + - ./persistent_data/logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins user: "${AIRFLOW_UID:-50000}:0" depends_on: @@ -82,28 +82,12 @@ services: - "dhparam:/etc/nginx/dhparam" - "vhost:/etc/nginx/vhost.d" - "certs:/etc/nginx/certs" - - "/run/docker.sock:/tmp/docker.sock:ro" + - "$XDG_RUNTIME_DIR/docker.sock:/tmp/docker.sock:ro" restart: "always" ports: - "80:80" - "443:443" - letsencrypt: - image: "jrcs/letsencrypt-nginx-proxy-companion:latest" - container_name: "letsencrypt-helper" - volumes: - - "html:/usr/share/nginx/html" - - "dhparam:/etc/nginx/dhparam" - - "vhost:/etc/nginx/vhost.d" - - "certs:/etc/nginx/certs" - - "/run/docker.sock:/var/run/docker.sock:ro" - environment: - NGINX_PROXY_CONTAINER: "reverse-proxy" - DEFAULT_EMAIL: "m.petrova@fz-juelich.de" - restart: "always" - depends_on: - - "reverse-proxy" - postgres: image: postgres:13 environment: @@ -137,8 +121,7 @@ services: environment: <<: *airflow-common-env - VIRTUAL_HOST: datalogistics.eflows4hpc.eu - LETSENCRYPT_HOST: datalogistics.eflows4hpc.eu + VIRTUAL_HOST: 203.145.218.64 VIRTUAL_PORT: 8080 healthcheck: @@ -185,7 +168,7 @@ services: volumes: - ./dags:/opt/airflow/dags - ./config/airflow.cfg:/opt/airflow/airflow.cfg - - /persistent_data/logs:/opt/airflow/logs + - ./persistent_data/logs:/opt/airflow/logs - ./tmp/:/work/ depends_on: <<: *airflow-common-depends-on ``` * Create data volumes and directories in advance ```bash docker volume create persistent_postgres-db-volume docker volume create persistent_certs mkdir -p persistent_data/logs/dag_processor_manager mkdir -p persistent_data/logs/scheduler ``` * Start service ```bash docker compose -f dockers/docker-compose.yaml --project-directory . up airflow-init docker compose -f dockers/docker-compose.yaml --project-directory . up -d ``` * Create admin account `dlsuser` inside `data-logistics-service-airflow-webserver-1` container ```bash docker exec -it data-logistics-service-airflow-webserver-1 bash airflow users create -e dlsuser@mail.addr -f User -l DLS -p [REDACTED] -r Admin -u dlsuser exit ``` #### Alien 4 Cloud & Ystia Orchestrator * Download Ystia Orchestrator ```bash wget https://github.com/ystia/yorc/releases/download/v4.3.0/yorc-4.3.0.tgz tar -xf yorc-4.3.0.tgz ``` * Create python virtual environment `yorc-env` and let `yorc` install prerequisites inside ```bash python3 -m venv yorc-env source yorc-env/bin/activate python3 -m pip install wheel ``` * Prepare configuration file `deploy.yaml` for deployment Fill in `password`, `private_key_content`, `private_key_file`, `ca_passphrase`, etc. and adapt others as appropriate. ```= alien4cloud: download_url: https://www.portaildulibre.fr/nexus/repository/opensource-releases/alien4cloud/alien4cloud-premium-dist/3.7.0/alien4cloud-premium-dist-3.7.0-dist.tar.gz port: 8088 protocol: http user: a4cuser password: [REDACTED] extra_env: "" yorcplugin: download_url: "" consul: download_url: https://releases.hashicorp.com/consul/1.11.11/consul_1.11.11_linux_amd64.zip port: 8500 tls_enabled: false tls_for_checks_enabled: false encrypt_key: "" terraform: download_url: https://releases.hashicorp.com/terraform/0.11.15/terraform_0.11.15_linux_amd64.zip plugins_download_urls: - https://releases.hashicorp.com/terraform-provider-null/1.0.0/terraform-provider-null_1.0.0_linux_amd64.zip - https://releases.hashicorp.com/terraform-provider-aws/1.36.0/terraform-provider-aws_1.36.0_linux_amd64.zip - https://releases.hashicorp.com/terraform-provider-consul/2.1.0/terraform-provider-consul_2.1.0_linux_amd64.zip - https://releases.hashicorp.com/terraform-provider-google/1.18.0/terraform-provider-google_1.18.0_linux_amd64.zip - https://releases.hashicorp.com/terraform-provider-openstack/1.32.0/terraform-provider-openstack_1.32.0_linux_amd64.zip yorc: download_url: https://github.com/ystia/yorc/releases/download/v4.3.0/yorc-4.3.0.tgz port: 8800 protocol: http private_key_content: | -----BEGIN RSA PRIVATE KEY----- [REDACTED] -----END RSA PRIVATE KEY----- private_key_file: [REDACTED] ca_pem: "" ca_pem_file: "" ca_key: "" ca_key_file: "" ca_passphrase: [REDACTED] data_dir: /var/yorc workers_number: 30 resources_prefix: yorc- locations: [] compute: shareable: true address: {} location: type: HostsPool name: Host-Pool resourcesfile: resources/ondemand_resources_hostspool.yaml properties: {} hosts: - name: eflows4hpc0cn1 connection: user: ubuntu private_key: /var/yorc/.ssh/yorc.pem host: eflows4hpc0cn1 port: 22 labels: host.cpu_frequency: "2.5 GHz" host.disk_size: "50 GB" host.mem_size: "32 GB" host.num_cpus: 4 os.architecture: "x86_64" os.distribution: "ubuntu" os.type: "linux" os.version: "22.04" private_address: 192.168.211.31 public_address: 192.168.211.31 - name: eflows4hpc0cn2 connection: user: ubuntu private_key: /var/yorc/.ssh/yorc.pem host: eflows4hpc0cn2 port: 22 labels: host.cpu_frequency: "2.5 GHz" host.disk_size: "50 GB" host.mem_size: "16 GB" host.num_cpus: 2 os.architecture: "x86_64" os.distribution: "ubuntu" os.type: "linux" os.version: "22.04" private_address: 192.168.211.25 public_address: 192.168.211.25 vault: download_url: https://releases.hashicorp.com/vault/1.0.3/vault_1.0.3_linux_amd64.zip port: 8200 insecure: true ``` * Deploy Alien 4 Cloud and Ystia Orchestrator together ```bash ./yorc bootstrap --insecure --values deploy.yaml ``` * Deactivate python virtual environment ```bash deactivate ```