Alternative Graph Backend deployment

We are moving away from Stardog as a graph backend, mostly because they no longer provide a free academic license but instead provide short-term "trials". Take a look at https://github.com/neurobagel/planning/issues/9 to see our progress in picking a replacement. In the meantime, here are instructions for deploying [graphDB](https://graphdb.ontotext.com/) as our graph backend instead of Stardog. ## Configure the environment variables Follow the [Launch the API](https://neurobagel.org/infrastructure/#launch-the-api-and-graph-stack) section of our public docs, but change the following variables in the `.env` file from [the defaults described in the docs](https://neurobagel.org/infrastructure/#set-the-environment-variables): ```sh NB_GRAPH_IMG=ontotext/graphdb:10.3.1 NB_GRAPH_ROOT_CONT=/opt/graphdb/home NB_GRAPH_PORT=7200 NB_GRAPH_PORT_HOST=7200 NB_GRAPH_DB=repositories/my_db # NOTE: for graphDB, this value should always take the the format of: repositories/<your_database_name> ``` Make a copy of [the default `docker-compose.yml`](https://github.com/neurobagel/api/blob/main/docker-compose.yml) file in the same directory and then run `docker compose up -d` to launch the Neurobagel services. Refer to [the API readme](https://github.com/neurobagel/api/blob/main/README.md) for additional instructions. ## First time setup commands When the API, graph, and query tool have been started and are running for the first time, you will have to do some first-run configuration. ### Setup security and users Also refer to https://graphdb.ontotext.com/documentation/10.0/devhub/rest-api/curl-commands.html#security-management First, change the password for the admin user that has been automatically created by graphDB: ``` curl -X PATCH --header 'Content-Type: application/json' http://localhost:7200/rest/security/users/admin -d ' {"password": "NewAdminPassword"}' ``` make sure to replace `"NewAdminPassword"` with your own, secure password. Next, enable graphDB security to only allow authenticated users access: ``` curl -X POST --header 'Content-Type: application/json' -d true http://localhost:7200/rest/security ``` and confirm that this was successful: ``` ➜ curl -X POST http://localhost:7200/rest/security Unauthorized (HTTP status 401) ``` Now we can create a user for the API: ``` curl -X POST --header 'Content-Type: application/json' -u "admin:newpassword" -d ' { "username": "DBUSER", "password": "DBPASSWORD" }' http://localhost:7200/rest/security/users/DBUSER ``` ### Create a graph database In graphDB, graph databases are called resources. To create a new one, you will also have to prepare a `data-config.ttl` file that contains the settings for the resource you will create ([see the graphDB docs](https://graphdb.ontotext.com/documentation/10.0/devhub/rest-api/location-and-repository-tutorial.html#create-a-repository)). **make sure to that the value for `rep:repositoryID` in the `data-configl.ttl` file matches the value of `NB_GRAPH_DB` in your `.env` file**. For example, if `NB_GRAPH_DB=my_db`, then `rep:repositoryID "my_db" ;`. You can use this example file and save it as `data-config.ttl` locally: ``` # # RDF4J configuration template for a GraphDB repository # @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix rep: <http://www.openrdf.org/config/repository#>. @prefix sr: <http://www.openrdf.org/config/repository/sail#>. @prefix sail: <http://www.openrdf.org/config/sail#>. @prefix graphdb: <http://www.ontotext.com/config/graphdb#>. [] a rep:Repository ; rep:repositoryID "my_db" ; rdfs:label "" ; rep:repositoryImpl [ rep:repositoryType "graphdb:SailRepository" ; sr:sailImpl [ sail:sailType "graphdb:Sail" ; graphdb:read-only "false" ; # Inference and Validation graphdb:ruleset "rdfsplus-optimized" ; graphdb:disable-sameAs "true" ; graphdb:check-for-inconsistencies "false" ; # Indexing graphdb:entity-id-size "32" ; graphdb:enable-context-index "false" ; graphdb:enablePredicateList "true" ; graphdb:enable-fts-index "false" ; graphdb:fts-indexes ("default" "iri") ; graphdb:fts-string-literals-index "default" ; graphdb:fts-iris-index "none" ; # Queries and Updates graphdb:query-timeout "0" ; graphdb:throw-QueryEvaluationException-on-timeout "false" ; graphdb:query-limit-results "0" ; # Settable in the file but otherwise hidden in the UI and in the RDF4J console graphdb:base-URL "http://example.org/owlim#" ; graphdb:defaultNS "" ; graphdb:imports "" ; graphdb:repository-type "file-repository" ; graphdb:storage-folder "storage" ; graphdb:entity-index-size "10000000" ; graphdb:in-memory-literal-properties "true" ; graphdb:enable-literal-index "true" ; ] ]. ``` Then you can create a new graph db with the following command (replace "my_db" as needed): ```bash curl -X PUT -u "admin:newpassword" http://localhost:7200/repositories/my_db --data-binary "@data-config.ttl" -H "Content-Type: application/x-turtle" ``` and add give our user access permission to the new resource: ``` curl -X PUT --header 'Content-Type: application/json' -d ' {"grantedAuthorities": ["WRITE_REPO_my_db","READ_REPO_my_db"]}' http://localhost:7200/rest/security/users/DBUSER -u "admin:newpassword" ``` - `"WRITE_REPO_my_db"`: Grants write permission. - `"READ_REPO_my_db"`: Grants read permission. **Note**: make sure you replace `my_db` with the name of the graph db you have just created. ### Upload test data to the graph To test that the above setup steps worked correctly, we can add some example graph-ready data (JSONLD files) to the new graph db from the [neurobagel/neurobagel_examples](https://github.com/neurobagel/neurobagel_examples) repository. First, clone `neurobagel/neurobagel_examples`: ```bash git clone https://github.com/neurobagel/neurobagel_examples.git ``` The `neurobagel/api` repo comes with a helper script [add_data_to_graph.sh](https://github.com/neurobagel/api/blob/main/add_data_to_graph.sh) to automatically upload all JSONLD files in a directory to a user-specified graph database, with the option to clear the existing data in the database first. _**A version of this script for a GraphDB endpoint is available from [here](https://gist.github.com/alyssadai/e10d0ba1d8e89d1564b7029b386e6637).**_ Download the `add_data_to_graph_graphdb.sh` script: ```bash git clone https://gist.github.com/e10d0ba1d8e89d1564b7029b386e6637.git ``` To view all the command line arguments for the script: ```bash ./add_data_to_graph_graphdb.sh --help ``` > ℹ️ **Note: If you prefer to directly use `curl` requests to modify the graph database instead of the helper script** > > Add a single dataset to the graph database (example): > ```bash > curl -u "<USERNAME>: <PASSWORD>" -i -X POST http://localhost:7200/repositories/<DATABASE_NAME>/statements \ > -H "Content-Type: application/ld+json" \ > --data-binary @<DATASET_NAME>.jsonld > ``` > > Clear all data in the graph database (example): > ```bash > curl -u "<USERNAME>: <PASSWORD>" -X POST http://localhost:7200/repositories/<DATABASE_NAME>/statements \ > -H "Content-Type: application/sparql-update" \ > --data-binary "DELETE { ?s ?p ?o } WHERE { ?s ?p ?o }" > ``` Now, we will upload to the graph db we created above the data in the directory `neurobagel_examples/data-upload/pheno-bids-output`. To do this, run the helper script as follows: ```bash ./add_data_to_graph_graphdb.sh PATH/TO/neurobagel_examples/data-upload/pheno-bids-output localhost:7200 repositories/my_db DBUSER DBPASSWORD \ --clear-data ``` **NOTE:** Here we added the `--clear-data` flag to remove any existing data in the database (if the database is empty, the flag has no effect). You can choose to omit the flag or explicitly specify `--no-clear-data` (default behaviour) to skip this step.