# NeuroCausal Project Technical Path to Follow ## 1. Accessing Data via Brainhack Cloud VM * DM @isil your email address to be added to [Brainhack Cloud](http://brainhack.org/brainhack_cloud/) * Accept the invitation Ps. this is enough for us to assign you to a project users hence you could use VM. * Send @isil your ssh-key. We will add your ssh key to VM so you will be able to ssh to the virtual machine by following the instructions [here](http://brainhack.org/brainhack_cloud/tutorials/vm/#connect-to-the-vm-using-the-oracle-key-pair) based on your operating system. Independent of the user itself, the user name to ssh into a VM is `opc` The IP to the VM is `129.159.195.204` Depending on your operating system please ssh into VM via `ssh opc@129.159.195.204` * Navigate to "raw_data" folder. **Bold** color refers to the sub folders. **raw_data** └── **query_ddf0af6c4b05fbed622e207b1afe24f7** ├── **articles** (_pubget output: downloaded papers, each in a separate xml file in separate folder._) │   ├── 019 │   │   └── pmcid_1253530 │   │   ├── article.xml │   │   └── tables │   │   └── tables.xml │   ├── 01f │   │   └── pmcid_1664585 │   │   ├── article.xml │   │   └── tables │   │   ├── table_000.csv │   │   ├── table_000_info.json │   │   ├── table_001.csv │   │   ├── table_001_info.json │   │   └── tables.xml │   ├── ... │   └── info.json └── **articlesets** (_pubget output: downloaded papers combined in xml files) │ ├── articleset_00000.xml │   │ └── tables.xml │   ├── ... │ └── info.json └── **subset_articlesWithCoords_extractedData** │ ├── authors.csv │ ├── coordinates.csv │ ├── coordinate_space.csv │ ├── info.json │ ├── links.csv │ ├── metadata.csv │ └── text.csv └── **chunked_data** (_not a pubget output but we divided data into operable chunks for the easeness of further processing using the code [here](https://github.com/neurocausal/neurocausal_meta/blob/main/clean_nq_data.py)_) └── **filtered_data** (_not a pubget output. We filtered the data using the clinical and technical filtering process defined **[here](https://github.com/neurocausal/neurocausal_meta/blob/main/filter_clinical.py)**_) └── **subset_allArticles-voc_e6f7a7e9c6ebc4fb81118ccabfee8bd7_vectorizedText** Depending the data you will need to work on, focus on the level you are interested in. ## 2. Dowloading Papers of a New Query with Pubget To investigate a new query and its outputs use [this tool](https://www.ncbi.nlm.nih.gov/pmc/advanced). Once you establish your query please follow the steps described in [pubget's repository](https://github.com/neuroquery/pubget) To be able to do it you will need pubmed API key for yourself. You can take this via [PubMed's APIkey](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)