# 1. Transfer Genome Sequence Data to the Cluster Using `sFTP`
###### tags: `2. Main Steps` `tutorials` `sftp` `genome sequence data`
**Note that we used Genewiz for short read sequencing services, so the steps below follow those instructions. I imagine it's similar using sFTP from other companies.
## Setups
1. Check that you have all the correspondence you need: host, username, and password.
2. Make sure you have access to Duke Compute Cluster (DCC). This instruction assumes that you will use `sftp` (stands for 'secure file transfer protocol') to transfer the genome sequence file *directly onto the cluster*, bypassing your local computer. This means the genome sequence data will *not*, in any given time, be stored in your local computer.
3. What is cluster? What is ssh? We are working on a document that outlines this stuff in a hopefully understandable way!
## Instructions
1. Log into your cluster using ssh
2. Type in `sftp user.name@sftp.genewiz.com`
* `sftp` stands for "secure file transfer protocol"
* Do *not* use `user.name` literally, use the actual username given by Genewiz
* This command lets you access the file storage of Genewiz specifically allocated for your username
3. Type `yes` to the prompted command about the key fingerprints. (Essentially, fingerprint is your computer/user's identity, which the system has yet to recognize if this is your first time logging in.)
4. Enter the password provided by Genewiz
* Don't worry if you type in and nothing shows up. Normally, typing in passcode on terminal does not show '********' like normal browser everyday does. You ARE typing something, just that nothing changes on the screen.
* **Pro tip:** Sarah found out that you can copy and paste the password
6. Set up the local directory where you want to download these files into
```
lcd /datacommons/noor2/lethal_seq/all_lethal_fastq
```
7. Navigate through sftp folders until you reach the directory that contains all fastq files
* Use `ls` to list all file and directories within your working directory
* Use `pwd` to prints the path to the current working directory
* Remember to use `cd folder_name` to navigate to the lower level directory
* To get out of your current directory up one level--go to the 'parental directory,' you can use `cd ..` command.
* If you don't remember all these commands. Go [here](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/Tutorials/useful_unix_commands.md).
9. Once you are **INSIDE** the directory containing all fastq files you want, there are two ways to download the files:
* Download all files within the project folder using `mget *`
* Download the folder AND everything within that folder by using **recursive** command `get -r Parent_folder`
For detailed instructions provided by Genewiz go [here.](https://f.hubspotusercontent00.net/hubfs/3478602/Sell%20Sheet%20Collateral%20Library/NGS/NGS%20User%20Guides/NGS_sFTP-Data-Download-Guide_Option%201_Nov03_2020.pdf)
## Tips
If you are working on DCC's shared space, and need a writing permission to download the files. You, or the owner of the directory, need to change the permission of the directory to "writable" for everyone.
chmod -R 777 Folder_name
This command line will fully open access to all (user/group/others).
It is also a good idea to remove permission from everyone, yes, including the owner, from writing a raw genome data. It is a good practice to work on copies of the raw data to prevent accidents.
For more information about file permissions, go [here](https://www.computerhope.com/unix/uchmod.htm).