# Deploying cBioPortal with SAML Authentication
## Overview
This is a guide made by documenting steps to setting up and configuring the [cBioPortal](https://www.cbioportal.org/) tool for secure usage by UCLA Pathology. We also use KeyCloak integrated with our enterprise identity provider Okta to manage users and their level of access to cBioPortal studies.
This guide closely mirrors the [cBioPortal deployment docs](https://docs.cbioportal.org/deployment/), but includes tweaks and small changes specific to this deployment.
### Dependencies
Docker, Java (JRE), Keycloak
## Server Setup
### VM and OS
The technical work here is handled by ISS, but we need to provide them some information:
- Our initial filesystem setup (January 2023). We allocate 60G to `/opt` as we will have Docker store its data there, which can include large images.
```bash
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 8.0K 4.0M 1% /dev
tmpfs 7.8G 384K 7.8G 1% /dev/shm
tmpfs 3.1G 286M 2.9G 10% /run
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
/dev/mapper/rootvg-root 2.0G 815M 1.1G 45% /
/dev/mapper/rootvg-usr 6.8G 3.1G 3.4G 47% /usr
/dev/mapper/rootvg-opt 60G 3.3G 55G 6% /opt
/dev/mapper/rootvg-var 2.0G 608M 1.3G 33% /var
/dev/sda1 986M 162M 774M 18% /boot
/dev/mapper/rootvg-home 2.0G 131M 1.7G 8% /home
/dev/mapper/rootvg-tmp 2.4G 1.8G 531M 78% /tmp
/dev/mapper/rootvg-home2 2.0G 72M 1.8G 4% /home2
tmpfs 1.6G 0 1.6G 0% /run/user/102129
```
We used SUSE Linux 15.3 for our OS.
### Network
We requested the following port configuration from the firewall team. This config reflects that we will serve cBioPortal on port 8080 and, in the future, Keycloak on port 8180.
```
Source Server Port Target Server Service Protocol
10.16.103.99 (lipcbioap01) 22,80,8080,8180,443 10.250.0.0/16 http TCP/IP
10.16.103.99 (lipcbioap01) 22,80,8080,8180,443 10.1.0.0/16 http TCP/IP
10.16.103.99 (lipcbioap01) 22,80,8080,8180,443 10.44.202.1/16 http TCP/IP
```
The target server ranges correspond to workstations in various parts of MDL:
`10.250.0.0/16` are VPN users.
`10.1.0.0/16` are machines in A6-214.
`10.44.202.1/16 ` are machines in Cyriac's office.
### Docker
By default, Docker `20.10.17-ce` stores image data to `/var/lib/`. In our deployment, we store this data in `/opt/` and use `/var/` strictly for logs.
First, ensure all docker services are stopped.
```bash
sudo systemctl stop docker
sudo systemctl stop docker.service
sudo systemctl stop docker.socket
```
Then move the docker root to `/opt`
```bash
sudo mv /var/lib/docker /opt
```
Next, add a line to `/etc/docker/daemon.json` to tell the docker daemon the new data root location.
```json
{
"data-root": "/opt/docker"
}
```
Finally, start the Docker service and ensure no errors occur.
```bash
sudo systemctl start docker
```
### Certificate
To utilize https, we require a certificate signed by a Certificate Authority (CA).
Start with creating a new directory private key, called `example.key` in our case. This will prompt you for a password.
```bash
openssl genrsa -des3 -out example.key 2048
```
After the key is successfully created, save the password you used to a `.pass` file:
```bash
echo <YOUR_PASSWORD> >> example.pass
```
Next, generate a Certificate Signing Request (CSR). It will prompt you for information about your organization, email, etc. **ISS requires that you use your FQDN for the Common Name**.
```bash
openssl req -new -out example.csr
```
This CSR can then be given to a Certificate Authority (CA), who will sign it and give you a certificate file (`.cer` or `.crt`) that `nginx` can use.
**NOTE:** For certs that we have received from UCLA, the cert chain is improperly ordered, causing `nginx` to throw `error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch`. You can check if this is the case by observing the `.cer` file with `openssl x509 -noout -text -in example.cer`. The subject line should match your distuingished name from the previous step. If it looks like information from the CA, **move the last certificate in the chain to the beginning.**
### Nginx
First, start the `nginx` service:
```bash!
sudo systemctl start nginx
```
and verify that it is listening on port 80:
```bash
sudo lsof -i:80
```
Add the following to your `nginx.conf`, again replacing bracketed values with your own. In our case, this file was located at `/etc/nginx/nginx.conf`.
```
http {
...
server {
listen 80;
listen 443 ssl;
server_name <your DNS 1> <your DNS 2>;
ssl_certificate </path/to/your/example.cer>;
ssl_certificate_key </path/to/your/example.key>;
ssl_password_file </path/to/your/example.pass>;
...
location / {
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $http_host;
proxy_pass "http://127.0.0.1:8080";
}
...
```
Finally, restart nginx:
```bash
sudo systemctl restart nginx
```
If this command succeeds, your web app should now support https, and forward requests from port 80 to port 8080.
### Keytool
[keytool](https://docs.oracle.com/javase/8/docs/technotes/tools/windows/keytool.html) is a Java tool for creating a key that our application will use as a signing certificate.
First, install a Java Runtime Environment (JRE) if you do not already have a Java installation. You can also use a JDK, but we only require a JRE.
```bash
sudo zypper install default-jre
```
Then, create a keystore using `keytool`.
```bash
keytool -genkey -alias secure-key -keyalg RSA -keystore samlKeystore.jks
```
This will create a Java keystore for a key called `secure-key` and place the keystore in a file named `samlKeystore.jks`. You will be prompted for:
- keystore password (required, for example: `apollo1`)
- your name, organization and location (optional)
- key password for secure-key (required, for example `apollo2`)
## SAML Authentication
<!-- ### Keycloak
Keycloak is a tool supported by cBioPortal that manages authorization, authentication, and user management. Keycloak also supports external Identity Providers (IDP), such as Okta (used by UCLA), using them to provide users and authorization levels.
Keycloak deploys as a server and a MySQL database, both in Docker containers.
First, set up a local network for Keycloak to communciate with its database:
```bash!
docker network create kcnet
```
Then, start the database container:
```bash!
docker run -d --restart=always \
--name=kcdb \
--net=kcnet \
-v "/home/svcpathx/cbio/kcdb-files:/var/lib/mysql" \
-e MYSQL_DATABASE=keycloak \
-e MYSQL_USER=keycloak \
-e MYSQL_PASSWORD=password \
-e MYSQL_ROOT_PASSWORD=root_password \
mysql:5.7
```
Finally, we'll start the Keycloak server itself. Note we run this on port 8180, and
**WARNING:** The Keycloak server will be visible on your network. Choose a strong admin password.
```bash!
docker run -d --restart=always \
--name=cbiokc \
--net=kcnet \
-p 8180:8080 \
-e DB_VENDOR=mysql \
-e DB_ADDR=kcdb \
-e KEYCLOAK_USER=admin \
-e "KEYCLOAK_PASSWORD=root_password" \
-e KC_SPI_TRUSTSTORE_FILE_FILE=samlKeystore.jks \
-e "KC_SPI_TRUSTSTORE_FILE_PASSWORD=apollo1" \
-e KC_SPI_TRUSTSTORE_FILE_HOSTNAME_VERIFICATION_POLICY=ANY \
-v /home/svcpathx/cbio/saml/samlKeystore.jks:/cbioportal-webapp/WEB-INF/classes/samlKeystore.jks \
jboss/keycloak:4.8.3.Final
```
Then, create the MySQL database that Keycloak will use:
**NOTE:** The directory `kcdb-files` will be where Keycloak stores its data. In our case, we used `/home/svcpathx/cbio/kcdb-files/`.
```bash
docker run -d --restart=always \
--name=kcdb \
--net=kcnet \
-v "/home/svcpathx/cbio/kcdb-files:/var/lib/mysql" \
-e MYSQL_DATABASE=keycloak \
-e MYSQL_USER=keycloak \
-e MYSQL_PASSWORD=password \
-e MYSQL_ROOT_PASSWORD=root_password \
mysql:5.7
```
-->
### cBioPortal Configuration
Modify `config/portal.properties` to reflect the following settings:
```
saml.sp.metadata.entitybaseurl=https://cbio.mednet.ucla.edu:443
saml.idp.metadata.location=classpath:/user-tailored-metadata.xml
saml.idp.metadata.entityid=https://<YOUR_FQDN>:8180/auth/realms/cbioportal
saml.keystore.location=classpath:/samlKeystore.jks
saml.keystore.password=apollo1
saml.keystore.private-key.key=secure-key
saml.keystore.private-key.password=apollo2
saml.keystore.default-key=secure-key
saml.custom.userservice.class=org.cbioportal.security.spring.authentication.keycloak.SAMLUserDetailsServiceImpl
saml.logout.local=false
saml.logout.url=/
```
Next, modify `docker-compose.yaml` as follows. Leave the existing volumes as they are.
The only change to `command:` is setting `-Dauthenticate=saml` and TODO `--proxy-base-url`
```
services:
cbioportal:
volumes:
- ...
- ./samlKeystore.jks:/cbioportal-webapp/WEB-INF/classes/samlKeystore.jks
- ./client-tailored-saml-idp-metadata.xml:/cbioportal-webapp/WEB-INF/classes/client-tailored-saml-idp-metadata.xml
...
command: /bin/sh -c "java -Xms2g -Xmx4g -Dauthenticate=saml -Dsession.service.url=http://cbioportal-session:5000/api/sessions/my_portal/ -jar webapp-runner.jar -AmaxHttpHeaderSize=16384 -AconnectionTimeout=20000 --enable-compression /cbioportal-webapp"
```
### Database Configuration
Use `docker ps` to see the name of the container for your MySQL database. In this case, it's `7e85dd90fa56_cbioportal-database-container`

Now, log in to the MySQL shell for that database. (Replace the example name with the name of your database from the previous step).
```bash
cd bioportal-docker-compose/
mysql -u cbio_user 7e85dd90fa56_cbioportal-database-container
```
By default, the login for this account is as follows:
```
username: cbio_user
password: somepassword
```
See the available databases by running `SHOW DATABASES;` at the MySQL prompt.

Connect to the `cbioportal` database:
```
USE cbioportal;
```
Now, execute the following commands to create an admin user that has authority to access all studies.
**NOTE**: Replace the example email and username with your desired credentials. Ensure that the email matches exactly in both entries.
```SQL
INSERT INTO cbioportal.users (EMAIL, NAME, ENABLED)
VALUES ('ian@example.com', 'Ian', 1);
```
```SQL
INSERT INTO cbioportal.authorities (EMAIL, AUTHORITY)
VALUES ('ian@example.com', 'cbioportal:ALL');
```
## Importing Studies
```bash!
docker-compose run \
-v "/home/svcpathx/cbio/portalinfo:/portalinfo" \
-w /cbioportal/core/src/main/scripts \
cbioportal \
./dumpPortalInfo.pl /portalinfo
docker-compose run \
-v "/home/svcpathx/cbio/report:/report" \
-v "/home/svcpathx/cbio/portalinfo:/portalinfo:ro" \
cbioportal \
metaImport.py -p /portalinfo \
-s /study/mdl_heme_2020 --html=/report/report.html
```
```