# Data Exploration of HuBMAP data on Bridges-2: HuBMAP’s scRNA-seq Data Analysis, Venn diagrams, and Gene Ontology Enrichment Analysis (GOEA)
## About this workshop
This is the documentation for the full-day workshop hosted at the University Ana G. Mendez. This workshop is designed for professionals and enthusiasts seeking to deepen their exploration of data in HuBMAP.
The Human Biomolecular Atlas Program (HuBMAP) is a program funded by the US National Institutes of Health to characterize the human body at single cell resolution, integrated to other efforts such as the Human Cell Atlas.
You are invited to attend a comprehensive, two-day workshop hosted at the Universidad Ana G. Mendez. This workshop is designed for professionals and enthusiasts seeking to deepen their understanding of data exploration and analysis with Python, single-cell RNA Sequencing Data Analysis of data from the Human Biomolecular Atlas Program (HuBMAP), Venn diagrams and Gene Ontology Enrichment Analysis (GOEA).
This workshop is open to members of PR-INBRE Bioinformatics Community of Practice (BiCoP).
Who should attend? Faculty, research staff and students looking to learn more about data exploration on public data from HuBMAP using OpenOnDemand on Bridges-2.
Why should I attend? Data exploration from HuBMAP public data within the context of open science and FAIR principles fosters collaborative research and knowledge discovery. Researchers can freely access and analyze diverse datasets encompassing human tissue samples, promoting transparency and reproducibility in biomedical research. By adhering to FAIR principles, HuBMAP ensures that data are findable, accessible, interoperable, and reusable, facilitating comprehensive exploration of human biology and disease mechanisms across scientific disciplines. This open approach empowers scientists to uncover novel insights into tissue architecture, cellular interactions, and molecular profiles, advancing our understanding of human health and paving the way for innovative medical interventions.
## Before we begin
- :warning: Have an issue or question?
- Feel free to ask during the presentation, on chat or Slack
- Send an email to the Help Desk `help@psc.edu` after the workshop
- :computer: What is the project charge ID?
- `see240003p`
- :computer: What is the reservation name?
- `RMhubmap`
- :computer: Where can I find the code and data?
- The code and data is located in `/ocean/projects/see240003p/shared`
- The code can be found in this [repo](https://github.com/pscedu/practical-intro-to-apptainer)
- :computer: Where can I find the docs?
- You can find the documentation [here](https://hackmd.io/@icaoberg/rJLDxPo6p).
## Resources available during this experience
* 20 regular-memory compute nodes that can be accessed using SLURM from the partition named `RM-shared` and reservation `RMhubmap`.
* Use OpenOnDemand to connect to Bridges 2 using the link `http://ondemand.bridges2.psc.edu`
* Bridges 2 official [documentation](https://www.psc.edu/resources/bridges-2/).
* OnDemand official [documentation](https://osc.github.io/ood-documentation/latest/)
## What to expect
* A gentle but practical introduction to OnDemand, as in how to connect and start Jupyter Lab.
* Learn about data exploration of HuBMAP public data in the web portal.
* How to access public HuBMAP data on Bridges-2 for exploration and analytics.
## Setup
:::warning
You should receive an email from PSC with your usename. The username to connect to our infrastructure might not the same as your ACCESS ID.
If you need to reset your PSC account password, then please visit `https://apr.psc.edu/`.
:::
# Exploring the HuBMAP portal
Visit `https://portal.hubmapconsortium.org/` for the main landing page of the project.
There are different entities in our provenance like
* donors
* tissues
* samples
* datasets
In this workshop we will focus on public datasets.
To explore the datasets click `Datasets`

and that should the Datasets page

In this page you can explore the pubkic datasets metadata.
## Exploring CODEX
On the left hand side we are going to select `DODEX`

and will select the second dataset in the list, `HBM734.XBSR.357`

and in turn this will open the dataset landing page.
:::info
Imaging assays can be visually inspectee on site.
:::

:::info
Most datasets have been minted

A DOI (Digital Object Identifier) is a unique and never-changing string assigned to online works.
:::
Some datasets are versioned
### Where is the dataset path on Bridges-2?
Check the URL for this dataset

You can extract the UUID from the URL
```
804df200e0003180cc5a62493ea5dced
```
the path to disk on Bridges-2 should be
```
/hive/hubmap/data/public/804df200e0003180cc5a62493ea5dced
```
# Connect to OnDemand on [Bridges-2](https://ondemand.bridges2.psc.edu) and use your Bridges-2 login to connect to the system (not your ACCESS ID)

and it should look like

click `JupyterLab`

and complete the form to look like this

:::info
The string in the extra arguments section is
```
-n 8 --mem=16000M --reservation RMhubmap
```
:::
and click `Launch` and wait

and wait to click `Connect to Jupyter`

Once it starts it should look similar to

and click `Terminal`

In Terminal type
```bash=
cd ~/
git clone https://github.com/hubmapconsortium/hubmap-data-exploration-workshop.git
cd hubmap-data-exploration-workshop
```