or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
GCB Academy: Intro to Scientific Computing for Genomics.
November 4-5, 2019
Course website: https://duke-gcb.github.io/SciComp-Nov-2019/ (includes links to materials)
Instructors: Hilmar Lapp, Dan Leehr
Stuck? Put your pink sticky note on the top of your computer.
Monday Morning - Shell
Launching the shell: Git Bash on Windows, Terminal on macOS.
Commands:
pwd
- print working directory, where am I?ls
- list files in current directoryls -F
- format directories by trailing/
ls -lh
- show human-readable sizesls -a
- include hidden files/directoriescd
- change directoriesman ls
- manual page for a command, e.g.ls
Note: to exit the manual page, press the q key on your keyboard
Exercise: What additional information do you learn with
ls -l
?Answer: Date modified, file size, owner, permissions, directory or not
Tab completion:
While typing a command or file name, hit the tab key and the shell will finish the word (or help as much as it can!)
cd un<tab>
becomescd untrimmed_fastq/
Tab completion also saves you from simple mistakes that are hard for us to spot but easy for computers. The computer knows its filenames, let it finish them for you.
Multiple matches? Start typing and hit tab. The shell will sound a ding, and list possible completions. Keep typing and hit tab again once it can pick a single result.
Navigation shortcuts
cd
(by itself, no directory name) - changes to your home directory (cd ~
has the same result)cd Desktop/shell_data/untrimmed_fastq
- go directly intountrimmed_fastq
with one command. Remember tab completion!cd ..
- go one level up in the directory tree (parent directory)cd ~/Desktop/shell_data
- from anywhere on the filesystem since we begin with~
(home directory)cd -
- Go back to where I came from (the previous directory you were in)Exercise: Find the hidden directory
Relative and Absolute paths
Relative Path: Any path that doesn't begin with a slash. (e.g.
Desktop/untrimmed_fastq
) Relative paths are relative to where you are.Absolute Path: Any path that begins with a slash (e.g.
/Users/sam/Desktop/untrimmed_fastq
)/
is the root directoryFiles and Directories
cd ~/Desktop/untrimmed_fastq
ls *.fastq
- list all files ending with.fastq
ls *977.fastq
- list all files ending with977.fastq
ls /bin/*sh
- list all the commands that are shellsecho $SHELL
- which shell am I running?history
- command history with number! {number, n}
- execute the nth command!!
is replaced with the contents of the previous command~
is an absolute path to the home directoryWorking with Files
cat SRR098026.fastq
- outputs ALL contents of the file to the screen (good for small files)less SRR098026.fastq
- lets you navigate and scroll through the filespace
- move forward in the file/
+[search term]
+[enter]
to search within the documentb
- move backwards in the fileg
- go to the beginning of the fileG
- go to the end of the fileq
- to exit 'less'Head and Tail
head
- prints the first 10 lines of a file to the screen (default is 10 lines)tail
- prints the last 10 lines of a file to the screen (default is 10 lines)-n 100
to instead print the first (head
) or last (tail
) 100 lines of the file to the screenexample:
head -n 1 SRR098026.fastq
will output the first line of the file:
@SRR098026.1 HWUSI-EAS1599_1:2:1:0:968 length=35
FASTQ File Format
A FASTQ file normally uses four lines per sequence.
@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
Creating, moving, copying, and removing files
cp
to copy a filecp /original_path/file.txt /new_path/file-copy.txt
mkdir
to make a new directory (folder)mkdir backup
makes a new directory called "backup"mv
can both move a file and/or rename that filemv SRR098026-copy.fastq backup/
movesSRR098026-copy.fastq
to thebackup
directorymv SRR098026-copy.fastq SRR098026-backup.fastq
*renamesSRR098026-copy.fastq
toSRR098026-backup.fastq
in the same directorymv /backup/file.txt .
.
indicates your current directory-i
to themv
command makes sure you don't accidentally overwrite filesmv -i file1.txt file2.txt
File Permissions
r
readw
writex
executels -l
shows you the permissions of your files:-rwxr-xr--
-
character is the file typerwx
characters are permissions for the file's ownerr-x
characters are permissions for the groupr--
characters are permissions for the current userchmod
change mode of the file (change permission settings)chmod -w file.txt
removes write permissionsrm file.txt
will return a "permission denied" messagechmod +w file.txt
adds write permissionschmod g+w file.txt
adds group write permissionchmod u+w file.txt
gives owner write permissionMore examples here: https://www.washington.edu/computing/unix/permissions.html
Removing a file or directory
-
rm
permanently deletes the file (i.e. does not go into your trash or recycle bin)-
rm -r
remove recursive - to delete a directory and everything in it NOTE this command should be used with care, as it can permanently delete a lot of dataExercise: Make a backup of your FASTQ files and move them to a new
backup/
directory. Remove 'write' permission for the backup files.Answer:
rm -r backup
cp SRR098026.fastq SRR098026-backup.fastq
andcp SRR097977.fastq SRR097977-backup.fastq
mkdir backup
andmv *-backup.fastq backup
chmod -w backup/*-backup.fastq
Redirection and searching with grep
grep
is a program that allows you to search within a file without having to open it.grep NNNNNNNNNN SRR098026.fastq
will return every single line in the SRR098026 file that contains at least 10 consecutive Ns is printed to the terminal, regardless of how long or short the file is.Search for a term within a specific context:
grep -B 1 -A 2 NNNNNNNNNN SRR098026.fastq
-B
before-A
afterExercise
grep -B1 GNATNACCACTTCC SRR098026.fastq
returns
@SRR098026.245 HWUSI-EAS1599_1:2:1:2:801 length=35 GNATNACCACTTCCAGTGCTGANNNNNNNGGGATG
grep -B1 AAGTT *.fastq
returns
SRR097977.fastq-@SRR097977.11 209DTAAXX_Lenski2_1_7:8:3:247:351 length=36 SRR097977.fastq:GATTGCTTTAATGAAAAAGTCATATAAGTTGCCATG -- SRR097977.fastq-@SRR097977.67 209DTAAXX_Lenski2_1_7:8:3:544:566 length=36 SRR097977.fastq:TTGTCCACGCTTTTCTATGTAAAGTTTATTTGCTTT -- SRR097977.fastq-@SRR097977.68 209DTAAXX_Lenski2_1_7:8:3:724:110 length=36 SRR097977.fastq:TGAAGCCTGCTTTTTTATACTAAGTTTGCATTATAA -- SRR097977.fastq-@SRR097977.80 209DTAAXX_Lenski2_1_7:8:3:258:281 length=36 SRR097977.fastq:GTGGCGCTGCTGCATAAGTTGGGTTATCAGGTCGTT -- SRR097977.fastq-@SRR097977.92 209DTAAXX_Lenski2_1_7:8:3:353:318 length=36 SRR097977.fastq:GGCAAAATGGTCCTCCAGCCAGGCCAGAAGCAAGTT -- SRR097977.fastq-@SRR097977.139 209DTAAXX_Lenski2_1_7:8:3:703:655 length=36 SRR097977.fastq:TTTATTTGTAAAGTTTTGTTGAAATAAGGGTTGTAA -- SRR097977.fastq-@SRR097977.238 209DTAAXX_Lenski2_1_7:8:3:592:919 length=36 SRR097977.fastq:TTCTTACCATCCTGAAGTTTTTTCATCTTCCCTGAT -- SRR098026.fastq-@SRR098026.158 HWUSI-EAS1599_1:2:1:1:1505 length=35 SRR098026.fastq:GNNNNNNNNCAAAGTTGATCNNNNNNNNNTGTGCG
Redirection to capture the output
>
this is the command to redirect outpute.g.
grep -B1 -A2 NNNNNNNNNN SRR098026.fastq > bad_reads.txt
will redirect the results of this grep search to thebad_reads.txt
filebad_reads.txt
filewc
is short for "word count" - returns information about your file:adding
-l
returns only the number of lines e.g.wc -l file.txt
Exercise: How many sequences in SRR098026.fastq contain at least 3 consecutive Ns?
Answer:
wc -l bad_reads.txt
returns249
sequences in the file have at least 3 consecutive Ns>>
appends to an existing file e.g.grep -B1 -A2 NNNNNNNNNN SRR097977.fastq >> bad_reads.txt
>
to redirect output to an existing file will overwrite contents of the fileYou can also use a wildcard to redirect output of multiple files to the same output files at once e.g.
grep -B1 -A2 NNNNNNNNNN *.fastq > bad_reads.txt
NOTE on file extensions:
The pipe command is
|
grep -B1 -A2 NNNNNNNNNN SRR098026.fastq | wc -l
will perform your grep search then "pipe" the output to thewc
program, so your output will be the number of lines returned from this grep search.For Loops
Performs a command over a set of parameters or conditions
To define a variable:
variable_name = information
ex:
f = Hello
To access a variable:
\(variable_name ex: `\)f`
Loop structure template:
ex1, print the names of fastq files in the present working directory:
ex2, make a backup of fastq files in the present working directory:
for f in $fastq; do echo cp $f $f.backup ; done
The iterable_name is a variable defined in the loop
For each loop, the iterable_name will contain the next item in the list_of_things
You can write the loop in one line, using a semicolon to deliminate each step (ex2)
Testing your Cluster Account
From your shell prompt, enter:
Replace NETID with your Duke NetID, and enter your NetID password when prompted.
If prompted with
Type
yes
.If you are greeted with a bash prompt, your account is working correctly.:
If you are unable to login, please add your NetID below:
Monday Afternoon - Programming with Python
File link: https://github.com/Duke-GCB/SciComp-Nov-2017/releases/download/1.0/python-fasta.zip
Running Jupyter Notebook
Method 1: Use Anaconda Navigator and run jupyter
Method 2: Type
jupyter notebook
into the terminalPrinting and Strings
Functions in python are used with the following syntax:
function()
The parentheses contain parameters you want to give the function
To print, use the
print()
function.Ex:
print('Hello')
A string is a datatype that indicates a word or sequence. In the example above, hello is in quotes, indicating it is a string
Variables are set in python in the following way
variable_name = information
To call a variable, you type the variable_name, unlike in the terminal where you add a '$' beforehand
Ex, prints the information stored in the variable called name:
You can use numbers (and other datatypes) in print as well
Indexing
You can access parts of a string (or list) with indexing.
General format:
Example, print the first letter in Python:
Indicies in python start at 0, so the first letter or item in a list will be index 0
Indexing starts at the position you indicate and ends at one position ahead. The end value is exclusive, so it is up to that position rather than including that position
Example, print 'ytho' from 'Python':
Check if a letter is in a string
General format:
Example, returns true:
Example, returns false:
Working with sequences
Given the sequence:
seq = 'ACCTGCATGC'
Ex1, reverse the first 3 letters
For strings, you cannot edit them in the following way:
string[3] = 'a'
Strings are immutable datatypes, which cannot be directly changed.
You can combine parts of a string together to make a new string (Ex1).
Reversing a sequence
You can reverse a sequence with indexing. Recall that the general format is
If you make the last parameter -1, it will count backwards and reverse the string
Ex2, reverse a string via indexing
Looping through a DNA sequence
A for loop can iterate through a string, each iteration will pick the next letter in the string
Ex1, print each letter in a sequence separately:
Loop structure is a bit different than bash in the terminal.
General format of a loop in python:
A loop in python requires a colon at the end of the for statement and each line of code within the loop to be indented
Ex2, reverse a string via for loop:
Dictionaries
Creating a dictionary:
Empty dictionary:
dictionary_name = {}
Dictionary with keys and values:
Adding a key:value pair to a dictionary you already made
Ex, a dictionary to give the complement letter
Getting the value from a dictionary using a key
Will return the value for key1, which we defined before as 'value1'
For keys, if the key is a string you will have to use the same exact string, including any capitalization.
Lists
List are containers that can contain items of any datatype (even other lists)
To make a list:
A list can contain different datatypes
Ex, a list with a string, integer (whole number), float (number with decimal points), and another list
You can loop (and index) through a list, just like you can loop through a DNA sequence or string
Conditionals, if, elif, else statements
Python has several statements to determine if something is true or false, conditional statements. Some common ones are:
==
equals>
greater than<
less thanin
item in a list, string or iterable sequencenot in
item not in a list or iterable sequenceYou can also use
and, or
statements to evaluate multiple statements. Or means one of the statements must be true while and means that both have to be trueEx, all return true:
If, elif, and else can be used to perform a task, given a certain condition
General format:
Python will evaluate the if statement first. If the statement returns false, it will then evaluate the elif (else if) statement. If the elif statement returns false, it will then do the else statement If the first if statement is true, it won't evaluate the elif or else statement.
Functions in Python
There are many built in functions available in python (such as the
print()
function). See the documentation for a list: https://docs.python.org/3/library/functions.htmllen
gives you the length of an object10
Some commonly used functions:
sort()
will sort a listoutput: [2, 3, 4, 7, 8]
You can also reverse the way
sorted()
sorts the list:You can run
help(sorted)
to learn about what other built-in options are available for a particular function.Jupyter notebook also allows you to view this in an alternate way:
shift
+tab
while your cursor is in the function callshift
+tab
again to get the pop-up to stay visible, and then hit these keys one more time (3 times total) to get the menu to appear separately at the bottom of your page.split()
will split the contents of your list based on a specified separator (such as '.' or ' ' (space)) into multiple list items. The default is white space between words.e.g.
returns:
['rna', 'dna', 'protein'] 3
(a list with 3 items instead of just 1)returns:
Whereas:
returns:
Exercise: Functions
bases = 'adenine cytosine guanine thymine'
Write some code that will:
hint: use
help(str)
andhelp(list)
to see what functions are available for strings and listsBonus write a for loop to print the first letter of each (e.g. 'A, C, …')
One solution:
Writing your own function
Start by defining the function using
def
and indenting inside the newly defined functions just like we did in for loops and if statements:Now we can use this function by passing in a value for the 'x' argument:
returns:
20
This function can also multiply floats and even strings!
will return:
EggsEggs
Now if we define a new variable:
Note that this 'x' will not be confused by the 'x' inside the
double()
function.Making a reverse function:
Now we can use this new function:
will give you the reverse of
seq
and
will reverse
seq
and then reverse it again back to its original stateMaking a complement function:
Now we can use this function:
will give you the reverse complement of
seq
NOTE the
comp()
function will only work if we have defined the dictionary earlier in the script. It is a much better to idea to include thecomps
dictionary inside the definition of the function:We can combine these two functions into one function that does both in one step:
Working with files in Python
In python, you want to open a file and store it in a variable so it can be used in your program
Now the data from
ae.fa
is stored in the variable f.We can loop over the data in f:
but this will just do the same thing as the
cat
function on the command line!Let's make a function to interpret the fasta file in a way that makes the different entries easier to work with. To start, let's convert our code from above into a function:
Step 1:
Step 2: add to list instead of printing and return that list
Step 3: is the line a sequence line or identifier line?
Once you have a function working, you can download it from Jupyter Notebooks as a python script (document with a
.py
file extension) and use it on your computer.Rename the file to something more readable if it was given an automated name:
mv download.html read_fasta.py
and you can run the script with:
python read_fasta.py
which can be piped to
wc -l
python read_fasta.py | wc -l
or used with any other command line tools.
Tuesday - Version Control
Git Lesson Online Reference: http://swcarpentry.github.io/git-novice/
https://github.com/Duke-GCB/scicomp-python/blob/master/intro-programming-python.ipynb
Python script to use for today. Save as a .py file.
General format of git commands
Git commands
git init
Initialize a git repositorygit status
Status of your git repositorygit config
Sets configurations for gitgit add
Lets git know that you want to make updates to a filegit commit
Makes a snapshot of the updategit diff
Shows what has changedgit log
Gives a log of the commitsgit checkout
Lets you navigate to a branchgit reset
Undo changes to local gitgit revert
Undo changes by making a commit reversing the previous commitWithin the python-fasta folder, use
git init
to initialize a repository andgit status
to check the statusNext, use the following git config commands
Nano is a terminal text editor.
After git commit, the editor will open. Then type in the comments for the commit. Use
git status
to check that the python-fasta.py file is no longer untracked. After you make changes to the file, runninggit status
should place the file undermodified: python-fasta.py
Use
git diff
to show the changes made. Thengit diff --staged
will show the change that has been staged.Use
git log
,git log --oneline
,git checkout 'first_7_chars_in_commit_checksum' -- python-fasta.py
About git reset, restore, and revert: https://git-scm.com/docs/git#_reset_restore_and_revert
General workflow for making commits:
git add filename
git commit
, then make comments on the commitMore with Git (Tuesday Afternoon session)
Branching
Making a new branch allows you to maintain different versions of your work silmutaneously.
git branch
shows that you are on the master branchTo make a new branch AND move to the new branch in one step:
git checkout -b file-as-param
-b
means create a new branchgit checkout
means to go this branchNow when you run
git status
, it will say:On brach file-as-param
at the top of the output message.On this new branch, we will modify our python-fasta.py script to take an input file as an argument.
Open python-fasta.py with nano and add
import sys
to the top of the file, then addsys.argv[1]
in place of the file name to indicate that you want the first input from the command line arguments to be used here.Now to run this script, you will have to enter:
And the script will run like it did before (check that it works before updating in git).
Save your file and exit nano, then again stage your changes:
git add python-fasta-py
and then commit using a quicker way just using the command line argument
-m
(stands for "message"):git commit -m 'Parameterizes name of the file'
git log
will show this new commit, and will also show that it is on the branchfile-as-param
.Next we will make one more change to our script to give a nicer error message if we forget to give a filename as an argument when running the script:
Now commmit your updates:
git add
git commit -m 'Catches case of missing filename'
Next we want to go back to the "master" branch, which has the version of python-fasta.py that does not take the filename as an argument.
Check which branch you are currently on using
git branch
. The branch with the '*' next it is the branch you are currently on.Note: if you try to run
git checkout -b master
won't work because the-b
argument means "make a new branch", but this branch already exists.Instead, run:
git checkout master
to get back to the master branch.We can check that this worked by again running
git log --oneline
which shows that HEAD is pointing to 'master' (HEAD -> master) in the output.What is a branch name in git?
git log --online
shows you the commits that came before the HEAD node e.g.Working with branches
Make a new branch:
git checkout -b ignore-data
And double check that you successfully switched using
git branch
(you should have a '*' next to a branch named 'ignore-data')We will make a new hidden file so git will ignore certain data files that you don't want to back up.
nano .gitignore
to open a new file.Type into your new document:
And save and exit.
Now when you run
git status
you won't see any files ending with '.fa' in your "Untracked files" section.You can open .gitignore again with nano and add:
Run
git add .gitignore
and
git commit -m 'Ignores data and notebook files'
to save your changes with git.Merging
To merge a branch with another one, you first have to checkout the branch you want to merge into. We want to merge our ignore-data branch with the master branch, so first:
git checkout master
then you an merge the branch you want with the branch you are currently on with the command:
git merge ignore-data
This has created a "fast forward" in git, where the pointer for "master" gets moved "forward" to a later commit (ignore-data) node. The HEAD pointer also is fast forwarded to the ignore-data node.
Now,
git log --online
will show:Fast forwarding happens when you didn't update anything on the master branch so you're just "fast forwarding" it to a later commit node.
What if we want to do something more complicated and merge two branches with different changes?
git merge file-as-param
creates a merge commit by opening a new file in nano. Save and close this file to commit the merge.Git can usually figure out how to merge two different branches, but sometimes there will be conflicts to be resolved.
Adding your repository to the cloud
Make a new repository on the Github website called python-fasta. You can choose to make it either public or private, but since we want to work with collaborators easily, we will make it public today.
NOTE Do NOT click the option to initialize this repository, because it already exists on our local computer. Only do this if you don't already have a repository created on your computer.
Github will take you to a helpful instructions page. We want to push an existing repository from the command line:
git remote add origin https://github.com/username/python-fasta.git
Git is what is known as a distributed version control system, meaning the repository exists fully on your local computer, but can also be mirrored elsewhere (like your github profile online, on a server, or on a different computer).
If you enter the command
git remote
or
git remote -v
(for more verbose options)it will tell you your two command options:
To move your changes from a local computer TO online (remote):
push
To move changes from the online version TO your local machine:
fetch
(sometimespull
)Enter
git push -u origin master
to add your changes to your online github repository.-u
option here means "upstream". We will discuss this more later.Enter your Github username and password when it prompts you to.
If you view this repository on the Github website, we can see it is no longer empty, however it doesn't yet have all of the commits and branches we've made today.
git push --all origin
will add the other branches and their commits to the online repository.Collaborating with others
You can collaborate with others by looking at their code on Github and modifying it. If you'd like to submit your changes to another person's repository, you can submit a pull request. The other person may or may not accept your changes, but this is one avenue through with you can collaborate with others. More on pull requests here: https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests
To pull from someone else's repsitory, use the command:
git clone <github-url>
where<github-url>
is just the url of the repository you want to clone. This will "clone" their repository from github (online) onto your local computer.It is important to work on your own branch of someone else's repository so you don't create conflicts.
git checkout -b add-docs
Now we will edit their copy of
python-fasta.py
by adding some documentation.git add python-fasta.py
to stage your changes.git commit
to open in nano and add the message:Using
git branch
still works to see how this collaborator's repository is structured (what commits have been added in the past), but you need to add a-a
argument to see everything in the remote branch:git branch -a
We have staged and committed our changes, but we still have to push to the remote repository online.
git push -u origin add-docs
-u
sets up to track the remote branch, which is very helpful when you want to continue collaborating (i.e. if the other user has accepted your pull request)-u
also allows you to just entergit push
in the future instead of having to add "origin" and the branch name every time.Let's say this other user accepted our pull request. Now we need to get up to date with their master branch:
git fetch origin
So now it's on our computer, but we have to merge:
git merge origin/master
We can see our pull request merged with the master branch if we use:
git log --oneline
NOTE If we had used
git pull
, it would have done the same thing as the two steps we did to update from the online repository (git fetch
andgit merge
). It can be useful to do this in two stages to make sure it's doing the right thing, but usuallygit pull
works fine.Running programs on HPC
Login to the cluster
ssh netid@dcc-slogin-duke.oit.duke.edu
Enter password, then you should see a prompt to run additional commands
hostname
gives the name of the host computersrun hostname
Connects to a remote server in the cluster and gives the hostname.srun
submits a job, you are assigned an ID and it is put into a que.Interactive job
srun --pty bash
Give the bash shell (bash) and let you interact with it (–pty)Some commands to run in the interactive session
pwd
present working directoryfree
available memoryexit
ends the interactive jobExit and go to the login node:
Then cd into the new folder created, scicomp-hpc. Check out the read_fasta.py file with
less fasta_gc.py
To run the file on the cluster:
To run a set of commands, use
sbatch
. Useman sbatch
to get more informationWrite a script to use with sbatch with nano,
nano countgc.sh
In the editor type:
Save the file. To submit the file run
sbatch countgc.sh
By default, it will make a file with the job id in the current directory. Ex: slurm-4272224.out
Results will be placed in the .out file. That file is updated as the job runs.
Checking job status in HPC
squeue
Inidcates the que of jobssacct
When the job is done, placed on the accounting listsqueue -u netid
will give the jobs for just the net id selectedsacct options
sacct --mem
amount of memory the job usedsacct -e
all options sacct can showBy default the cluster requests 2GB of memory for each job.
You can edit the script to request a different amount of memory with
#SBATCH --mem=requested_amount
Ex, requesting 100MB of memory:
nano countgc.sh
Other options to specify in the script: https://slurm.schedmd.com/sbatch.html
Add the following to have the HPC email you when the job is done
Placing
srun
in front of the commands in the bash script will indicate if specific parts of the script fail viasacct
#SBATCH --array=1-5%2
will have a batch of items run at a time.Adding variables to the bash script
Moving files in/out of the cluster (Afternoon session)
NOTE: Github should be used primarily for small text/script files, CSV files, etc, NOT for backing up your data, as there is a ~1 GB storage limit.
Reminder, you can redirect command line output with
>
:echo "Data Staging > datafile.txt"
scp
(stands for "secure copy") allows you to securely copy data files to the server:scp datafile.txt username@dcc-slogin.oit.duke.edu:datafile.txt
:
, the copied file will end up in your home directoryAnother way to download data to the server is with the command
wget
if you have a url for the data you want.srun wget http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/chrI.fa.gz
Other files are available in an online databases: e.g. http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/
It's a good idea to move data files out of your home directory on the server, as these usually have limited storage space.
Finding software on the HPC
Modules
Check what software is available with:
module avail
To use software use the command:
module load <X>
e.g.
module load RepeatMasker/4.0.5
Check that it is working with
module list
, which should list the pieces of software you have loadedYou can load multiple modules at one time. Load another program:
module load Perl/5.30.0
and runmodule list
again to see that now there are 2 items.Now you can run the software you have loaded. e.g.
RepeatMasker -species "Saccharomyces cerevisiae" chrI.fa.gz
Reminder: use command
srun --pty bash
to start an interactive session.module purge
will close any software you had loaded. You can double check that this worked by again enteringmodule list
to see that it is now empty.HARDAC is another computing cluster at Duke: https://wiki.duke.edu/x/QgSsAw
Note: saving your scripts (and using version control) is crucial for being able to reproduce your own results and allowing others to repeat your experiments.