1. Transfer Genome Sequence Data to the Cluster Using `sFTP`

# 1. Transfer Genome Sequence Data to the Cluster Using `sFTP` ###### tags: `2. Main Steps` `tutorials` `sftp` `genome sequence data` **Note that we used Genewiz for short read sequencing services, so the steps below follow those instructions. I imagine it's similar using sFTP from other companies. ## Setups 1. Check that you have all the correspondence you need: host, username, and password. 2. Make sure you have access to Duke Compute Cluster (DCC). This instruction assumes that you will use `sftp` (stands for 'secure file transfer protocol') to transfer the genome sequence file *directly onto the cluster*, bypassing your local computer. This means the genome sequence data will *not*, in any given time, be stored in your local computer. 3. What is cluster? What is ssh? We are working on a document that outlines this stuff in a hopefully understandable way! ## Instructions 1. Log into your cluster using ssh 2. Type in `sftp user.name@sftp.genewiz.com` * `sftp` stands for "secure file transfer protocol" * Do *not* use `user.name` literally, use the actual username given by Genewiz * This command lets you access the file storage of Genewiz specifically allocated for your username 3. Type `yes` to the prompted command about the key fingerprints. (Essentially, fingerprint is your computer/user's identity, which the system has yet to recognize if this is your first time logging in.) 4. Enter the password provided by Genewiz * Don't worry if you type in and nothing shows up. Normally, typing in passcode on terminal does not show '********' like normal browser everyday does. You ARE typing something, just that nothing changes on the screen. * **Pro tip:** Sarah found out that you can copy and paste the password 6. Set up the local directory where you want to download these files into ``` lcd /datacommons/noor2/lethal_seq/all_lethal_fastq ``` 7. Navigate through sftp folders until you reach the directory that contains all fastq files * Use `ls` to list all file and directories within your working directory * Use `pwd` to prints the path to the current working directory * Remember to use `cd folder_name` to navigate to the lower level directory * To get out of your current directory up one level--go to the 'parental directory,' you can use `cd ..` command. * If you don't remember all these commands. Go [here](https://github.com/Noor-WGS-data/Genome_sequence_data/blob/main/Tutorials/useful_unix_commands.md). 9. Once you are **INSIDE** the directory containing all fastq files you want, there are two ways to download the files: * Download all files within the project folder using `mget *` * Download the folder AND everything within that folder by using **recursive** command `get -r Parent_folder` For detailed instructions provided by Genewiz go [here.](https://f.hubspotusercontent00.net/hubfs/3478602/Sell%20Sheet%20Collateral%20Library/NGS/NGS%20User%20Guides/NGS_sFTP-Data-Download-Guide_Option%201_Nov03_2020.pdf) ## Tips If you are working on DCC's shared space, and need a writing permission to download the files. You, or the owner of the directory, need to change the permission of the directory to "writable" for everyone. chmod -R 777 Folder_name This command line will fully open access to all (user/group/others). It is also a good idea to remove permission from everyone, yes, including the owner, from writing a raw genome data. It is a good practice to work on copies of the raw data to prevent accidents. For more information about file permissions, go [here](https://www.computerhope.com/unix/uchmod.htm).