# RMTA on OSG Here are the instructions for running RMTA on OSG (Open Science Grid) ## Login to Submit host ``` $ ssh <username>@login.osgconnect.net # username is your username ``` ## Run OSG-RMTA on the sample data The sample data can be found in the sample_data folder in [here](https://github.com/Evolinc/OSG-RMTA/tree/master/sample_data_osg) ``` git clone https://github.com/Evolinc/OSG-RMTA.git cd OSG-RMTA/sample_data_osg ``` In the `sample_data_osg` folder you will find input files and the following scripts for job submission to OSG ### Job description file Here is an example of Job description file (`osg-rmta.submit`) for running RMTA ``` # The UNIVERSE defines an execution environment. You will almost always use VANILLA. Universe = vanilla # These are good base requirements for your jobs on OSG. It is specific on OS and # OS version, core cound and memory, and wants to use the software modules. Requirements = HAS_SINGULARITY == True request_cpus = 1 request_memory = 2 GB request_disk = 4 GB # Singularity settings +SingularityImage = "/cvmfs/singularity.opensciencegrid.org/evolinc/osg-rmta:2.1" # EXECUTABLE is the program your job will run It's often useful # to create a shell script to "wrap" your actual work. Executable = osg-rmta-wrapper.sh Arguments = # inputs/outputs transfer_input_files = osg-rmta.sh, Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa, Sorghum_bicolor.Sorbi1.20_chr8.gtf, sample_1_R1.fq.gz, sample_1_R2.fq.gz transfer_output_files = final_out, index # ERROR and OUTPUT are the error and output channels from your job # that HTCondor returns from the remote host. Error = $(Cluster).$(Process).error Output = $(Cluster).$(Process).output # The LOG file is where HTCondor places information about your # job's status, success, and resource consumption. Log = $(Cluster).$(Process).log # Send the job to Held state on failure. on_exit_hold = (ExitBySignal == True) || (ExitCode != 0) # Periodically retry the jobs every 1 hour, up to a maximum of 5 retries. periodic_release = (NumJobStarts < 5) && ((CurrentTime - EnteredCurrentStatus) > 60*60) # QUEUE is the "start button" - it launches any jobs that have been # specified thus far. Queue 1 ``` ### Executable script Here is an example of executable script (`osg-rmta.sh`) ``` #!/bin/bash Hisat2-Cuffcompare-Cuffmerge.sh -g Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa -A Sorghum_bicolor.Sorbi1.20_chr8.gtf -l "FR" -1 sample_1_R1.fq.gz -2 sample_1_R2.fq.gz -O final_out -p 6 -5 0 -3 0 -m 20 -M 50000 -q -t -f 2 -k 2 ``` ### Wrapper script Here is the wrapper script (`osg-rmta-wrapper.sh`) ``` #!/bin/bash bash osg-rmta.sh > osg-rmta.out ``` ### Job submission Submit the job using `condor_submit`. ``` $ condor_submit osg-rmta.submit ``` ### Job status Your first job is on the grid! The `condor_q` command tells the status of currently running jobs. Generally you will want to limit it to your own jobs by adding your own username to the command. ``` condor_q <username> ``` ### Job output Once your job has finished, you can look at the files that HTCondor has returned to the working directory. If everything was successful, it should have returned: - `final_out` which contains bam, gtf and other files - `index` which contains the indices of the reference genome