owned this note
owned this note
Published
Linked with GitHub
---
tags: workshop
---
# Human Microbiome Project CFDE February Hackathon Session
### Lesson Goals:
- Find HMP data in [CFDE Portal](https://app.nih-cfde.org/)
- Obtain a DRS ID for a file in the portal
- Transfer the file to AWS using the [GA4GH DRS Client](https://github.com/ga4gh/ga4gh-drs-client)
- Run FastQC on the file and download the results to your local computer
### Questions?
If you have questions,
- Type them in the group chat
- Direct message the moderator
- Unmute and ask them outloud
We're going to use the raised hand :raised_hand: reaction in zoom to make sure people are on board during the hands-on activities.
### Helpful Links
[CFDE Portal](https://app.nih-cfde.org/): Find HMP and other data sets here!
[GitHub Issue for this session](https://github.com/nih-cfde/2022-feb-hackathon/issues/13): Find helpful links and information here!
[Hackathon Website](https://nih-cfde.github.io/2022-feb-hackathon/): For more information about hackathon activities!
[CFDE AWS Lesson Template](https://github.com/nih-cfde/training-and-engagement/blob/dev/hackmd/AWS.md): Reference for your AWS questions!
# Phase 1: Use the CFDE Portal to get HMP Data DRS
Go to CFDE Seardch Portal at: [https://app.nih-cfde.org/](https://app.nih-cfde.org/)
Click on "File" on the home page.
![](https://i.imgur.com/l0G200Y.png)
Use the "Refine search" bar on the left side of the screen to select "HMP: Human Microbiome Project" under "Common Fund Program" and "true" under "Has Persistent ID"
![](https://i.imgur.com/FTHLVAD.png)
Also select "DNA Sequence" under "Refine search" and then copy the "Persistent ID" for the file in the second row.
You can also click on the column all the way to the left labeled "View" to access and download metadata associated with this file.
![](https://i.imgur.com/5WhhZhP.png)
The Persistent ID is a DRS ID that will allow us to download the file the an Amazon Web Service EC2 Instance.
```
drs://drs.hmpdacc.org/e6zJNvLeY6CP
```
#### Bonus Goal
Download the metadata associated with this file. You can do so by clicking "Export" after selecting "View" from the results table.
![](https://i.imgur.com/VjXKzaz.png)
You can also poke around the portal and find data you are interested in!
# Phase 2: Downloading and Working with HMP Data on Amazon Web Services EC2
## 1. Terminology and Sign-On
Cloud computing is the on-demand use of data storage and compute power without direct active management by the user. Amazon Web Services (AWS) is one of the most broadly adopted cloud platforms.
Some advantages of using AWS include:
- Easy sign-on
- Simple billing
- Stable services
- Customizable images
- Customer support
- Online resources
![](https://uploads-ssl.webflow.com/5e1f17bab0dc6527c1ecc801/5e55f0ab6725fd082d2ea435_amazon-hosting.jpeg)
Amazon's Elastic Compute Cloud (**EC2**) is a web service that provides secure, resizable compute capacity in the cloud. Amazon's Simple Storage Service (**S3**) is widely used for storing and sharing data.
An **instance** is a virtual machine that runs in the cloud. An **image** (or AMI for Amazon Machine Image) is a template that contains the software configuration (including operating system and applications) required to launch your instance. You can select an image provided by the AWS Marketplace, the AWS community, or you can select one of your own images. When you launch an instance, you specify the type of image to use.
Today, everything you do will be paid for by us. Your free login credentials will work for the next 24 hours. In the future, if you create an AWS account, you will have to add a credit card for billing. We'd be happy to answer questions about how to pay for AWS.
Log in to your account by going to this web address: https://cfde-training-workshop.signin.aws.amazon.com/console.
![](https://hackmd.io/_uploads/SJfyT66pt.png)
Find your first name in the table below and log in with that as your IAM user name and the password provided by the instructors.
_Note: table delete Mar 1._
> :raised_hand: Raise your hand in Zoom when you've successfully logged in with the workshop user credentials.
## 2. Launching an EC2 Instance
You can launch an instance using the AWS launch instance wizard. The launch instance wizard specifies all the launch parameters required for launching an instance. Where the launch instance wizard provides a default value, you can accept the default or specify your own value. At the very least, you need to select an AMI and a key pair to launch an instance. Let's walk through the following steps.
1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/
2. AWS has servers all over the world. In the top right corner, click the drop-down menu to select a global region. For this workshop choose **US West (N. California) us-west-1**. _In the future, you should pick a region near your or one that contains your data._
![](https://hackmd.io/_uploads/Hk6dhoapY.png)
3. Now, click the [![ - Launch instances](https://img.shields.io/badge/_-Launch_instances-ec7211)](https://us-west-1.console.aws.amazon.com/ec2/v2/home?region=us-west-1#LaunchInstanceWizard:) button.
<!---
You should see a page that looks like this:
--->
4. AWS is beta testing a new version of the launch version of the wizard that goes through all the steps in one page instead of many. It is awesome. Click the [![ - Try it now!](https://img.shields.io/badge/_-Try_it_now!-276fc4)](https://us-west-1.console.aws.amazon.com/ec2/v2/home?region=us-west-1#LaunchInstances:) button at the top to get started. _If you accidentally close the banner with the beta button, refresh the page to bring it up again._
![](https://hackmd.io/_uploads/BkDh0opaF.png)
5. First, give your instance a name (such as your first name) so that you can distinguish your instances from your classmates'. This is optional but very useful for keeping track of multiple instances on the same account.
6. The next step is to pick an image. Our preferred image is not listed in the Quick Start list, so we must find it in the Marketplace. Type **Ubuntu 20.04 LTS - Focal** in the search bar. Then click AWS Marketplace AMIs. Once you see, Ubuntu 20.04 LTS - Focal, click [![Select](https://img.shields.io/badge/Select-ec7211)](https://).
7. Next, we must specify how much memory and ram we need by specifying an **instance type**. The **t2.micro** instance is "Free tier eligible" and provides 1CPU and 1GB of memory. This is perfect for our class.
8. Next, **create a new key pair**. This will be used in the next section to connect to your instance via `ssh`. Give your key pair a name (without spaces). Use the default settings of RSA type and .pem format. Save this file locally (e.g. in your downloads or your desktop).
9. For this workshop, we will choose the default network, security, and storage settings, so there is nothing else to change.
10. Next, change the **Security group name** to something specific, like your first name.
![](https://i.imgur.com/kHVWxXb.png)
11. Scroll down to the bottom of the page and click [![Launch instance](https://img.shields.io/badge/Launch_instance-ec7211)](https://).
12. Once your instance launches, click the [![View all instances](https://img.shields.io/badge/View_all_instances-ec7211)](https://) button at the bottom of the page.
> :raised_hand: Raise your hand in Zoom when you've successfully launched an instance.
Congratulations! You have successfully launched an instance. The next step is to connect to your instance.
## 3. Connecting to AWS instances
There are three ways to connect an AWS instance:
1. with a web browser
2. using `ssh` from the Terminal
3. using an ssh client such as MobaXterm
Let's connect to our instances using a web browser.
1. Find your instance in the list of running instances.
2. Click the empty check box next to your name.
3. Then click "Connect" in the top center of your browser.
![](https://hackmd.io/_uploads/BkcjPnTTt.png)
4. This will open a window that provides details about your instances. Click the [![Connect](https://img.shields.io/badge/Connect-ec7211)](https://) button at the bottom of your screen.
![](https://hackmd.io/_uploads/BkEUdnT6Y.png)
After you click connect, a new tab will open in your browser with a Terminal window that looks something like this.
![](https://hackmd.io/_uploads/By7fqaaaK.png)
> :raised_hand: Raise your hand in Zoom when you've successfully connected to your instance.
_If at any time, your instance stops responding, hit the "refresh" button and functionality should be restored, right where you left off._
### Installing Software
First we need to install Python.
```
sudo apt update
sudo apt-get install python3.7
sudo apt install python3-pip -y
```
Install GA4GH, which is the DRS resolver that we will use.
For reference, full installation instructions are available [here](https://ga4gh-drs-client.readthedocs.io/en/latest/installation.html).
```
pip install ga4gh-drs-client
```
Refresh your browser terminal window to add this to PATH. Then, let's confirm that the installation worked by bring up the help message.
```
drs get --help
```
# Phase 3: Using the DRS ID to Download a File
The DRS ID we got from the portal is
```
drs://drs.hmpdacc.org/e6zJNvLeY6CP
```
which follows the format
```
drs://URL/OBJECT_ID
```
We need to format our command for the DRS resolver like this:
```
drs get URL OBJECT_ID
```
Let's format our DRS ID to match the desired formatting:
```
drs get https://drs.hmpdacc.org e6zJNvLeY6CP
```
After running that command, the terminal gives us a bunch of output including a JSON file:
```
2022-02-22 21:29:39,834DEBUGcommand-line arguments: {'url': 'https://drs.hmpdacc.org', 'object_id': 'e6zJNvLeY6CP', 'authtoken': 'omitted', 'download': False, 'expand':
False, 'logfile': None, 'max_threads': 1, 'output_dir': '/home/ubuntu', 'output_metadata': None, 'silent': False, 'suppress_ssl_verify': False, 'validate_checksum': False, '
verbosity': None}
2022-02-22 21:29:39,834INFOissuing request to DRS Object endpoint
2022-02-22 21:29:39,834DEBUGURL: https://drs.hmpdacc.org/ga4gh/drs/v1/objects/e6zJNvLeY6CP
2022-02-22 21:29:39,835DEBUGHeaders: {}
2022-02-22 21:29:39,835DEBUGRequest params: {'expand': False}
2022-02-22 21:29:40,108INFOJSON for object e6zJNvLeY6CP successfully retrieved
{
"id": "e6zJNvLeY6CP",
"created_time": "2020-12-29T18:26:51.000Z",
"drs_id": "e6zJNvLeY6CP",
"checksums": [
{
"checksum": "dd85ff79c0d7e6710e39a77810bfc7a7",
"type": "md5"
},
{
"checksum": "c1fb4a5eb856810e43bb0158d71b820ad63b610aba5cff10cf3e95907ff3aabb",
"type": "sha-256"
}
],
"self_uri": "drs://drs.hmpdacc.org/e6zJNvLeY6CP",
"size": 99527955,
"name": "SRS014475_hmwgsqcv1.tar.bz2",
"access_methods": [
{
"access_url": {
"url": "s3://hmpdcc/hmp1/hhs/microbiome/wms/analysis/qc/qc_2012/SRS
014475_hmwgsqcv1.tar.bz2"
},
"type": "s3",
"region": "us-east-1"
},
{
"access_url": {
"url": "https://hmpdcc.s3.amazonaws.com/hmp1/hhs/microbiome/wms/analysis/qc/qc_2012/SRS014475_hmwgsqcv1.tar.bz2"
},
"type": "https"
}
]
}
2022-02-22 21:27:52,597INFOobject/bundle download not requested
2022-02-22 21:27:52,597INFOexiting with exit code: 0
```
We want to take the `access_url` and use `curl` to download the file to an our Amazon instance.
```
curl -O https://hmpdcc.s3.amazonaws.com/hmp1/hhs/microbiome/wms/analysis/alignments/read2ref_2017/SRS014475.MetaRef-unaligned_reads.fastq.tar.bz2
```
When we type `ls` we should see the file `SRS014475.MetaRef-unaligned_reads.fastq.tar.bz2`
This file is compressed. Let's uncompress it so we can work with it.
```
tar -xf SRS014475.MetaRef-unaligned_reads.fastq.tar.bz2
```
### Working with the HMP file
Now that we have a fastq file on AWS, let's run it through FastQC to look at sequence quality!
```
sudo apt install fastqc -y
```
To double check it was successful, type `fastqc --version`. If it returns 0.11.9, that means installation was successful. You can also type `fastqc --help` to view the manual.
A fastqc command looks like this: `fasqtc -o <output directory> <file>`
The output directory must exist! Let's make a directory called `fastqc`.
```
mkdir output
```
We can run FastQC.
```
fastqc -o output SRS014475.MetaRef-unaligned_reads.fastq
```
The standard output looks like this:
```
Started analysis of SRS014475.MetaRef-unaligned_reads.fastq
Approx 10% complete for SRS014475.MetaRef-unaligned_reads.fastq
Approx 25% complete for SRS014475.MetaRef-unaligned_reads.fastq
Approx 35% complete for SRS014475.MetaRef-unaligned_reads.fastq
Approx 50% complete for SRS014475.MetaRef-unaligned_reads.fastq
Approx 60% complete for SRS014475.MetaRef-unaligned_reads.fastq
Approx 75% complete for SRS014475.MetaRef-unaligned_reads.fastq
Approx 85% complete for SRS014475.MetaRef-unaligned_reads.fastq
```
Now, we can navigate to our results directory to view the results.
```
cd output
ls
```
For every input, there are two outputs: an html file and a ziped folder. The html files are of interest.
```
SRS014475.MetaRef-unaligned_reads_fastqc.html SRS014475.MetaRef-unaligned_reads_fastqc.zip
```
> Click the raised hand :hand: if you have html files in a results directory.
Congratulations, you have successfully, launched and connected to an instance,
navigated the file system, downloaded data, installed programs, and executed programs at the command line. Here's an overview of some of the commands we used.
#### Now, try finding data you are interested in on the CFDE Portal and transporting it to this Amazon AWS instance!
# The Final Frontier: Downloading Data from AWS to your local computer
After processing your data in the cloud, you most likely need to copy some of your files to your local computer for viewing and sharing. In this section, we will use our ssh keys and public Domain Name System (DNS) to securely copy files from the cloud to our local computer using either a secure shell (`ssh`) or secure copy (`scp`).
### Windows Users
If you have a Windows machine, you will need to download a Terminal program. We recommend MobaXterm which is both a Terminal and an SSH client.
Read the following steps and/or watch [this short video tutorial](https://us06web.zoom.us/rec/play/1GfdPKpeJ5CVd8L6aPdYnOYuU3VmRmeoIHmdChyTNtUvpPIbezzdxdAGghAmsDPzhrGdi2SgkGSa_RqZ.7Ml7d475z1S9cItV).
**MobaXterm installation**
1. Go to the MobaXterm website to [download](https://mobaxterm.mobatek.net/)
2. Click on "GET MOBAXTERM NOW!"
3. The Home Edition works great and is free. Click "Download now".
4. Click on "MobaXterm Home Edition v20.6 (Portable edition)" and save as in your Downloads folder.
5. Go to your Downloads folder, click on the zipped folder, click "Extract all", click "Extract"
6. The MobaXterm application is now in the unzipped folder
7. Click on the MobaXterm application to open it!
Now that you have MobaXTerm installed you need to find the name and the address of your instance. To do so, let's reconnect to our instances.
#### (Re)Connect to your EC2 instance
1. In a new browser tab or winder, navigate to the [instances page](https://us-west-1.console.aws.amazon.com/ec2/v2/home?region=us-west-1#Instances:).
2. Check the empty box next to your instance.
3. Click the "Connect" button.
4. Click the **SSH client** tab.
5. Find the "Example:" ssh command. Copy the last piece of information, which contains the public DNS for your instance and the computer name. It will look something like "ec2-54-193-121-227.us-west-1.compute.amazonaws.com"
![](https://i.imgur.com/wERegQo.png)
6. In MobaXterm, click on "Session"
7. Click on "SSH"
8. Enter the Public DNS as the "Remote host"
95. Check the box next to "Specify username" and enter "ubuntu" as the username
6. Click the "Advanced SSH settings" tab
7. Check box by "Use private key"
8. Use the document icon to navigate to where you saved the private key (e.g., "amazon.pem") from AWS on your computer. It is likely on your Desktop or Downloads folder
9. Click "OK"
10. A terminal session should open up with a left-side panel showing the file system of our AWS instance!
11. Click on one of the FastQC html files to view it in a browser.
> Click the raised hand :hand: in zoom once you have viewed opened an html file.
### MacOS
Mac users do not need to install any additional programs to transfer files. You do however need to locate the ssh key file you saved at the beginning of the workshop.
1. Open a Terminal window
2. Navigate your private key file and change the permissions using `chmod 400` to ensure your key is not publicly viewable. _Note: your .pem file may be in a different directory and have a different name. Modify the following commands accordingly._
```
cd ~/Desktop/
chmod 400 key-file.pem
```
3. In a new browser tab or window, navigate to the [instances page](https://us-west-1.console.aws.amazon.com/ec2/v2/home?region=us-west-1#Instances:)
4. Check the empty box next to your instance.
5. Click the "Connect" button.
6. Click the **SSH client** tab.
7. Find the "Example:" ssh command. Copy the last piece of information, which contains the public DNS for your instance and the computer name. It will look something like "ubuntu@ec2-54-193-121-227.us-west-1.compute.amazonaws.com"
![](https://i.imgur.com/3Gvqyy6.png)
8. Use the `scp` command on your local terminal to copy all the `.html` files. The `-i` option is used to specify the ssh key file. As with the copy (`cp`) command, you must specify both the location of the source file and the location of the copied file. When specifying the source file, you must first include the Public DNS link (ec2-.....amazon.com) and the name of the user (@ubuntu). To specify the path to the file, add a `:` after the DNS and the paste the path to the file.
Your command will look something like this. Remember to use your .pem file and your DNS. You can specify the current directory on your local computer with `.`
```
scp -i keys.pem ubuntu@ec2-54-193-121-227.us-west-1.compute.amazonaws.com:~/MiSeq/results/fastqc/F3D141_S207_L001_R1_001_fastqc.html .
```
If this is your first time connecting to an instance, you may be prompted with the following question" Are you sure you want to continue connecting (yes/no/[fingerprint])?". Type "yes".
If you want to copy all the html files, you will need to put the path the files in single quotes to escape the wildcard.
```
scp -i keys.pem 'ubuntu@ec2-54-193-121-227.us-west-1.compute.amazonaws.com:~/MiSeq/results/fastqc/*fastqc.html' .
```
> Click the raised hand :hand: in zoom once you have viewed opened an html file.
Congratulations! You have now successfully downloaded files from the cloud to your local computer.
## Shutting down instances
The AWS Free Tier of services only remains free if you stay within the usage limits. If your instance is running in the cloud, you may be charged even if you aren't using it for computer power or storage. It is therefore good practice to shut down your instances when not in use.
There are three options for shutting down instances.
- Stopping:
- saves data to EBS root volume
- only EBS data storage charges apply
- No data transfer charges or instance usage charges
- RAM contents not stored
- Hibernation:
- charged for storage of any EBS volumes
- stores the RAM contents
- it's like closing the lid of your laptop
- Termination:
- complete shutdown
- EBS volume is detached
- data stored in EBS root volume is lost forever
- instance cannot be relaunched
These accounts will remain available for 24 hours before your instructor deletes them. If you wish to return to your instance within the next 24 hours, stopping it is a good idea. If you are done practicing, terminating the instance is the best idea.
To shut down an instance:
1. Navigate to the [instances page](https://us-west-1.console.aws.amazon.com/ec2/v2/home?region=us-west-1#Instances:)
2. Check the empty box next to your instance
3. Click the "Instance state" button
4. Select "Stop instance" or "Terminate instance" as appropriate