Integrating Neo4J to a Shiny app

# Integrating Neo4J to a Shiny app By [Elie ARNAUD](mailto:elie.arnaud@mnhn.fr), MNHN/PNDB, on 2021/04/06 ## Goal Neo4J is a database management system relying on graph technology. Each node and relationship within a graph can be described with a number of properties. This constitutes a powerful tool for numerous applications. In our case, the aim is to allow the user to browse a work environment centered around the french Biodiversity Information System (SIB). The database is built with five types of nodes: Information Systems (noted as SI*, with two subtypes `SIF` and `SIm` respectively for unifying SI and job SI), Application (noted as is, for tools), Actors (noted as `organisme`, for institutional organisations or associations) and Data (noted as `jdd` for "jeux de données", meaning datasets). Relationships between those entities are various, according to the diversity of nodes. The choice of Shiny was a challenge, to see how difficult it would be to link R and Neo4J. The package {neo4r} greatly helped us in setting this connection. For reproducibility and scalability purpose, a docker compose file has been written to ease the deployment of the app. All the required modifications will be explained in the matching paragraphs. The incoming notes concern the "local" set up, for Neo4J Desktop and RStudio. ## Structure The application is divided into two main parts: the main page and the result pop up. In short, the main page lets the user search among the Neo4J database for anything he wants and displays the found results, and the pop up comes when the user clicks one of these results. The whole app relies on the remote Neo4J database, accessible through local URL. It is therefore required to set it up previously to any use of the app. ### Neo4J setup We used Neo4J Desktop to build the application. These steps are simple enough. Make sure to get the APOC plugin be installed. 1. Create a database. For compatibility purposes with {neo4r}, we created it under version 3.5. 2. Copy the data files in the "import" folder of Neo4J (default is `/var/lib/neo4j/import`) 3. Head to Neo4J browser (in your favority web browser). 4. Upload the import cypher scripts in the Ne4J browser, setting them as "Favorites" to find them easily. 5. Run each script, ending with the one(s) setting up the relations in your database. Your Neo4J database is properly set up, and exposed on `http://localhost:7474`. We can now head to the R part. {neo4r} offers a method based on a R6 object to establish the connection with the database: ``` con <- neo4r::neo4j_api$new( url = "http://localhost:7474", id = "neo4j" # default password = "neo4j" # default ) ``` To check if the base is accessible, you can ping it with `con$ping()` and expect a `Status: 200` response. ### Main page The main page of the app is itself divided in multiple parts. #### Header A common header, letting the title and logos appear. We also added a database status, to let the user know if the database is accessible or not (with the ping method). At any time, the user can refresh this indicator with a button to check his connection. #### Query Using {shinyWdigets}' `searchInput()` and `radioGroupButtons()`, the search UI allows the user to type in for words that will be looked after in the nodes main label, long label and description. The number of result is limted to 50 by default, but the user can restrict this to 40, 30, 20 or 10. The less results he wants, the more precise he will have to be. The aim of this database search is not to present the most results possible rather than presenting the desired results. #### Results The results are displayed within a `checkboxGroupButtons()` rendered from a `renderUI()` according to returned results. This was prefered to a varying number of individual inputs for each result. The query fetches a graph of whom only nodes are taken and displayed in the results. Each result is displayed in a button, with some information according to the node type (among the five presented earlier). Non-truthy information has been removed before adding it (such as "null" or NA fields). Each type of node is associated with a color code, allowing a quicker understanding of the displayed results. Results are also enumerated per type on top of the result area. Upon clicking on a result makes the graph pop up appear. For technical reason, the whole `checkboxGroupButtons()` is updated with "selected = NA" to let the button behave as if it was an `actionButton()` which had been clicked. Beside the result, the main page lets the user see some filters to apply to the results. This will remove the results matching unchecked boxes. Curently, only filtering by type has been added (but other criterions could be added). ### Graph pop up The pop up is composed of three areas: the graph itself, the description and the contribution areas. #### Graph The graph is rendered with {visNetwork}. It presents all nodes and relationships linked at two degrees or less to the clicked result. The legend enumerates the presented types of nodes. Clicking on a node changes the description to the clicked node's one. #### Description The description is a list of the properties of the node. Non-truthy information has been removed before adding it (such as "null" or NA fields). A "Navigate" button lets the user navigate to the node. This means generating a new graph (with a Neo4J query), centered on the clicked node, updating the graph view in the GUI and all its components (legend, title, ...). #### Contribution This is sepcific to the case of the SIB. This form allows the user to record a suggestion in a dedicated app file. The file is a simple table composed of the following fields: - mail: to identify the person writing the suggestion and allow further exchanges - target type: node or relationship - target name: varying according to target type, either the name of a node, or the triplet "NODE RELATIONSHIP NODE" - action: what shall be done on the target, either "add", "edit" or "remove". For "add", it is expected to select a node which will be neighbour to the new node. - commentary: explain the reason of the modification, and some details of its implementation These are stored in a file in `$HOME`. Since the application is not purposed to be deployed on one's computer but as a service accessible online, this shall not violates user's rights. ## Dockerization For reproducibility purposes and as a deployment solution, we choose to use Docker to embed our application environment. The `docker-compose.yml` file at the root of the git allows to launch two services: Neo4J (from the official docker image) and our application. To ease the dockerization of the app, we wrote it as a package, called {shiny4cartosib}. Its function `launchApp()` is a shortcut to the application. Since the docker-compose file requires some more tweaks, a bash script (`up.sh`) was written to allow all the deployment process to be launched from a single command line. ### Detail of the docker-compose Since the dockerization process relies on a docker-compose, it will be useful for anyone to understand how this one was built. So here is a detail of every section. ```{yml} version: '3' ``` Standard header, with docker-compose version. Here, the version `3` was reused from another docker-compose file for Neo4J services. ```{yml} services: neo4j: image: neo4j:3.5 restart: unless-stopped container_name: neo4j ``` Start the description of services to launch. First service is Neo4J, using version 3.5 for compatibility purpose with {neo4r} package. It is asked to restart the container whenever it stops (which might happen on bad transaction with the database). ```{yml} ports: - 7474:7474 # HTTP -- only required by {neo4r} - 7473:7473 # HTTPS - 7687:7687 # bolt ``` These are default ports for the Neo4J service. `7474` will be used by the shiny app for any interaction. `7687` will allow the user to access the database through Neo4J browser for any purpose. ```{yml} volumes: - ./neo4j_setup/import/:/var/lib/neo4j/import/ - ./neo4j_setup/plugins/:/var/lib/neo4j/plugins/ - ./neo4j_setup/:/neo4j_setup ``` The `import` volume shall be used to store data that will be uploaded into the database. The `plugins` volume shall contain .jar files for the plugins you want to use. The `neo4j_setup` volume is a convenience folder in the git to store Neo4J-related files. ```{yml} environment: # Install APOC library - NEO4JLABS_PLUGINS='[\"apoc\"]' ``` This enables the use of plugins that are contained in `plugins` volume (here, APOC). ```{yml} # Allow files interaction - NEO4J_apoc_export_file_enabled=true - NEO4J_apoc_import_file_enabled=true - NEO4J_apoc_import_file_use__neo4j__config=true - NEO4J_dbms_security_allow__csv__import__from__file__urls=true ``` By default, the Neo4J database cannot interact with files in the system without its configuration being set for. These configuration lines allow the Neo4J process to read CSV nodes and relationships files from `import` volume. ```{yml} # Allow port interaction - NEO4J_dbms_connector_http_enabled=true - NEO4J_dbms_connector_http_advertised__address=127.0.0.1:7474 - NEO4J_dbms_connector_bolt_advertised__address=127.0.0.1:7687 ``` Opening the ports might be not enough: these ensures to allow interaction through `7474` (for {neo4r}) and force the output to localhost URL. ```{yml} # Raise memory limits - NEO4J_dbms_memory_pagecache_size=1G - NEO4J_dbms.memory.heap.initial_size=1G - NEO4J_dbms_memory_heap_max__size=1G ``` For convenient use, highen memory-related limits of the database. ```{yml} # Authentication setup - NEO4J_USERNAME=neo4j - NEO4J_PASSWORD=neo4j - NEO4J_AUTH=neo4j/neo4j - NEO4J_DATABASE=SIF ``` The three first lines above are duplicates: the two first equals the third, and are just shown here to present the different ways to set the authentication variables. Don't worry that the password is clear on the local machine: it will be encrypted once container is deployed. ```{yml} shiny_app: build: . image: my_shiny_app container_name: shiny ports: - 3838:3838 ``` This part defines the shiny app docker configuration. It expects the Dockerfile to be located in the same directory as the docker-compose file. The `3838` port is a commonly used port for shiny apps. ## Main known issues The application modularization has been thought to divide as much as possible the features. However, some modules (such as Query and Criterions) might be not used enough in their context. But they are already there if any features shall be added to those parts. The database browser has limited features: it queries the Neo4J base by the type of node, and only looks for text in up to three fields (`label`, `label_long` and `description`). It uses regular expression to use , but does not allow approximative match. ## References - {neo4r} package, Colin Fay et alL, 2019-06-03: https://github.com/neo4j-rstats/neo4r - {visNetwork} package, Almende B.V., Benoit Thieurmel, Titouan Robert, 2019-12-06 : https://cran.r-project.org/web/packages/visNetwork/index.html - Neo4J official website: https://neo4j.com/ - "How To Dockerize ShinyApps", Oliver Guggenbühl, 2020-05-15: https://www.statworx.com/at/blog/how-to-dockerize-shinyapps/ List of R packages used (available on CRAN): - config (v.0.3.1) - data.table v.1.14.0 - dplyr (v.1.0.5) - golem (v.0.2.1) - neo4r (v.0.1.1) - sendmailR (v.1.2-1) - shiny (v.1.6.0) - shinyFeedback (v.0.3.0) - shinyjs (v.2.0.0) - shinyWidgets (v.0.5.7) - stringr (v.1.4.0) - visNetwork (v.2.0.9)