# NeuroCausal Project Technical Path to Follow
## 1. Accessing Data via Brainhack Cloud VM
* DM @isil your email address to be added to [Brainhack Cloud](http://brainhack.org/brainhack_cloud/)
* Accept the invitation
Ps. this is enough for us to assign you to a project users hence you could use VM.
* Send @isil your ssh-key. We will add your ssh key to VM so you will be able to ssh to the virtual machine by following the instructions [here](http://brainhack.org/brainhack_cloud/tutorials/vm/#connect-to-the-vm-using-the-oracle-key-pair) based on your operating system.
Independent of the user itself, the user name to ssh into a VM is `opc`
The IP to the VM is `129.159.195.204`
Depending on your operating system please ssh into VM via
`ssh opc@129.159.195.204`
* Navigate to "raw_data" folder. **Bold** color refers to the sub folders.
**raw_data**
└── **query_ddf0af6c4b05fbed622e207b1afe24f7**
├── **articles** (_pubget output: downloaded papers, each in a separate xml file in separate folder._)
│ ├── 019
│ │ └── pmcid_1253530
│ │ ├── article.xml
│ │ └── tables
│ │ └── tables.xml
│ ├── 01f
│ │ └── pmcid_1664585
│ │ ├── article.xml
│ │ └── tables
│ │ ├── table_000.csv
│ │ ├── table_000_info.json
│ │ ├── table_001.csv
│ │ ├── table_001_info.json
│ │ └── tables.xml
│ ├── ...
│ └── info.json
└── **articlesets** (_pubget output: downloaded papers combined in xml files)
│ ├── articleset_00000.xml
│ │ └── tables.xml
│ ├── ...
│ └── info.json
└── **subset_articlesWithCoords_extractedData**
│ ├── authors.csv
│ ├── coordinates.csv
│ ├── coordinate_space.csv
│ ├── info.json
│ ├── links.csv
│ ├── metadata.csv
│ └── text.csv
└── **chunked_data** (_not a pubget output but we divided data into operable chunks for the easeness of further processing using the code [here](https://github.com/neurocausal/neurocausal_meta/blob/main/clean_nq_data.py)_)
└── **filtered_data** (_not a pubget output. We filtered the data using the clinical and technical filtering process defined **[here](https://github.com/neurocausal/neurocausal_meta/blob/main/filter_clinical.py)**_)
└── **subset_allArticles-voc_e6f7a7e9c6ebc4fb81118ccabfee8bd7_vectorizedText**
Depending the data you will need to work on, focus on the level you are interested in.
## 2. Dowloading Papers of a New Query with Pubget
To investigate a new query and its outputs use [this tool](https://www.ncbi.nlm.nih.gov/pmc/advanced).
Once you establish your query please follow the steps described in [pubget's repository](https://github.com/neuroquery/pubget)
To be able to do it you will need pubmed API key for yourself. You can take this via [PubMed's APIkey](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)