Here are the instructions for running RMTA on OSG (Open Science Grid)
$ ssh <username>@login.osgconnect.net # username is your username
The sample data can be found in the sample_data folder in here
git clone https://github.com/Evolinc/OSG-RMTA.git
cd OSG-RMTA/sample_data_osg
In the sample_data_osg
folder you will find input files and the following scripts for job submission to OSG
Here is an example of Job description file (osg-rmta.submit
) for running RMTA
# The UNIVERSE defines an execution environment. You will almost always use VANILLA.
Universe = vanilla
# These are good base requirements for your jobs on OSG. It is specific on OS and
# OS version, core cound and memory, and wants to use the software modules.
Requirements = HAS_SINGULARITY == True
request_cpus = 1
request_memory = 2 GB
request_disk = 4 GB
# Singularity settings
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/evolinc/osg-rmta:2.1"
# EXECUTABLE is the program your job will run It's often useful
# to create a shell script to "wrap" your actual work.
Executable = osg-rmta-wrapper.sh
Arguments =
# inputs/outputs
transfer_input_files = osg-rmta.sh, Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa, Sorghum_bicolor.Sorbi1.20_chr8.gtf, sample_1_R1.fq.gz, sample_1_R2.fq.gz
transfer_output_files = final_out, index
# ERROR and OUTPUT are the error and output channels from your job
# that HTCondor returns from the remote host.
Error = $(Cluster).$(Process).error
Output = $(Cluster).$(Process).output
# The LOG file is where HTCondor places information about your
# job's status, success, and resource consumption.
Log = $(Cluster).$(Process).log
# Send the job to Held state on failure.
on_exit_hold = (ExitBySignal == True) || (ExitCode != 0)
# Periodically retry the jobs every 1 hour, up to a maximum of 5 retries.
periodic_release = (NumJobStarts < 5) && ((CurrentTime - EnteredCurrentStatus) > 60*60)
# QUEUE is the "start button" - it launches any jobs that have been
# specified thus far.
Queue 1
Here is an example of executable script (osg-rmta.sh
)
#!/bin/bash
Hisat2-Cuffcompare-Cuffmerge.sh -g Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa -A Sorghum_bicolor.Sorbi1.20_chr8.gtf -l "FR" -1 sample_1_R1.fq.gz -2 sample_1_R2.fq.gz -O final_out -p 6 -5 0 -3 0 -m 20 -M 50000 -q -t -f 2 -k 2
Here is the wrapper script (osg-rmta-wrapper.sh
)
#!/bin/bash
bash osg-rmta.sh > osg-rmta.out
Submit the job using condor_submit
.
$ condor_submit osg-rmta.submit
Your first job is on the grid! The condor_q
command tells the status of currently running jobs. Generally you will want to limit it to your own jobs by adding your own username to the command.
condor_q <username>
Once your job has finished, you can look at the files that HTCondor has returned to the working directory. If everything was successful, it should have returned:
final_out
which contains bam, gtf and other files
index
which contains the indices of the reference genome