How to Run `ipyrad` on Hydra

# How to Run `ipyrad` on Hydra ***All of these steps should be run on one of the login node.*** ## Create an `IPython` Profile Compatible with Hydra (You only have to do this once) 1. Load the ipyrad module: ``` $ module load bioinformatics/ipyrad/0.7.29 ``` 1. Configure ipyrad to run on Hydra by running `config4hydra` ``` $ config4hydra ``` - This script creates an "sge" IPython profile: (The name *sge* is arbitrary, but it will be used throughout the rest of this walk-through.) That creates a directory in your HOME directory, called ~/.ipython, and then prepopulates it with default config files. These are edited ipython config files setup to use Hydra GE. 3. **From your ipyrad working directory:** Run the ` cp_templates` command to copy the 3 template files to your working directory (`template`, `run-ipyrad.job` and `start-stop-ipcluster.csh`): ``` $ cp_templates ``` ## Testing `ipcluster` on Hydra (Again, you only have to do this once, to ensure that the previous steps were done correctly) 1. **From your ipyrad working directory:** start `ipcluster` as follows: `ipcluster start -n N --profile=sge --daemonize` where you replace `N` by the number of 'engines' to start, like for example 4. This will start N+1 jobs: one ipcontroller and N ipengine as N tasks of one job array. `$ ipcluster start -n 4 --profile=sge --daemonize` 1. Check that the N+1 jobs have been started by the GE and are (eventually) in 'r' state. *Make sure to wait at least 1 minute to get past 60 second delay that we programmed into the config files (See Appendix).* ``` $ module load tools/local $ q+ +a% ``` 1. If you see N+1 entries in the queue, it's working properly. 2. Now kill the `ipcluster`, or it will keep running, with N+1 jobs doing nothing: `$ ipcluster stop --profile=sge` ## Running an `ipyrad` Job on Hydra ***All of these steps should be run on one of the login node, using a distinct directory.*** 1. Prepare a parameter file following the instructions here: https://ipyrad.readthedocs.io/tutorial_intro_cli.html#create-an-ipyrad-params-file 1. That file will be used by starting `ipyrad` via a job file using something like this: `$ ipyrad -n params-project_name.txt` Using an editor change the parameters appropriately. 1. You can adjust two parameters in the `start-stop-ipcluster.csh` file: * The first is the amount of compute nodes to be used, `N`, which we have set to **20** in line 4. This sets the number of `iPython` "engines" to run. * The second is the queue and the amount of memory _each_ of the `iPython` "engine" are requesting. This is done by editing the value of the `queueSpec` variable on line 5. - By default, we have the job running on the long high memory queue `lThM.q` requesting **30GB** per core. This has been tested on a large dataset and should be sufficient for most projects. - The default `start-stop-ipcluster.csh` file looks like this: ```sh #!/bin/csh # # no of engines, queue specification, stop file name, how often to check for the stop file, and the ipcluster profile name @ N = 20 set queueSpec = 'lThM.q -l himem,mres=30G' set stopFile = stop-ipcluster-profile=sge.now set waitTime = 5m set ipProfile = sge # # load the ipyrad module module load bioinformatics/ipyrad/0.7.29 # # remove the stop file, if there is one rm -f $stopFile # # start the predefined IPython cluster (SGE type), with the queue spec, --daemonize MUST be last arg echo + `date` ipcluster start --n $N --profile=$ipProfile --BatchSystemLauncher.queue="$queueSpec" --daemonize ipcluster start --n $N --profile=$ipProfile --BatchSystemLauncher.queue="$queueSpec" --daemonize # # wait to let all the engines start, # by qstating and counting the engines every 30 sec, for 10 passes (5m) @ nPassMax = 10 @ iPass = 1 loop: echo + `date` sleep 30 sleep 30 @ nc = `qstat -s r -u $USER | grep -c " controller "` @ ne = `qstat -s r -u $USER | grep -c " engine "` echo + `date` "$nc controller and $ne engine(s) running in the queue" if ($nc == 0 || $ne != $N) then if ($iPass > $nPassMax) then echo + `date` "no controller or wrong number of engine(s) in the queue at pass # $iPass" echo + `date` ipcluster stop --profile=$ipProfile ipcluster stop --profile=$ipProfile echo + `date` $iPass passes, exiting exit 1 endif @ iPass++ goto loop endif # # submit the job, passing on the stop file name and tthe ipProfile echo + `date` qsub run-ipyrad.job $stopFile $ipProfile qsub run-ipyrad.job $stopFile $ipProfile # # now loop until the stop file is found echo + `date` looking for $stopFile every $waitTime @ i = 0 while (! -e $stopFile) sleep $waitTime @ i++ if ( ($i % 12) == 0) echo + `date` looking for $stopFile every $waitTime end # # job completed, stopping the IPython cluster echo + `date` ipcluster stop --profile=$ipProfile ipcluster stop --profile=$ipProfile # # we're done exit ``` 4. The default template file,`template`, specifies the parameters needed to run `ipcluster`, used by `ipyrad`, on Hydra. It should not need to be edited and by default looks like this: ``` sh # #$ -cwd -j y -o $JOB_NAME.$JOB_ID.$TASK_ID.log #$ -q {queue} #$ -t 1-{n} # module load bioinformatics/ipyrad/0.7.29 set type = $JOB_NAME set echo python -m ipyparallel.$type --profile-dir="{profile_dir}" --cluster-id="{cluster_id}" ``` 1. The job file`run-ipyrad.job` submits your `ipyrad` job on Hydra and uses the `ipcluster` jobs spun up by `start-stop-ipcluster.csh`. The one line that you will want to change is the `ipyrad` command line, by adding the specific ipyrad parameter file created for this run in line 12 and the list of steps to execute. ``` sh # /bin/csh #$ -cwd -j y -o run-ipyrad.log #$ -N ipyrad #$ -q lThC.q # echo + `date` job $JOB_NAME started in $QUEUE with jobID=$JOB_ID on $HOSTNAME # set stopFile = $1 set ipProfile = $2 module load bioinformatics/ipyrad/0.7.29 # # now start ipyrad ipyrad -p params-project_name.txt -s 1234567 -t 1 --ipcluster=$ipProfile # # when done, tell start-stop to shut down the IPython cluster date > $stopFile # echo = `date` job $JOB_NAME done ``` 1. In your working directory you should now have 4 files: ``` $ ls -l -rw-rw-r-- params-project_name.txt -rw-rw-r-- run-ipyrad.job -rwxrwxr-x start-stop-ipcluster.csh -rwx------ template ``` - **NOTE** Make sure that `start-stop-ipcluster.csh` is set to be executable and the `template` file has the correct permission settings, if not do the following: $ chmod +x start-stop-ipcluster.csh $ chmod 700 template 1. Finally, start your `ipyrad` job as follows: `$ ./start-stop-ipcluster.csh &> start-stop-ipcluster-login01.log &` * The `&` at the end of the line will run the command in the background so that you can continue using on Hydra while the job runs. If you start it on login02, adjust accordingly the log file name. * That script will start and stop everything: `ipcluster`, one `controller`, N `engines`, and the `ipyrad` job. * IPcluster produces a `controller` and an `engine` file, from the `template` file, as well as a controller log file (`controller.NNNN.1.log`) and N engine log files (`engine.NNNN.M.log`). The ipyrad job produces a log file as well (`run-ipyrad.log`). ## Stopping a Job * The `start-stop-ipcluster.csh` script initiates `ipcluster`, start the `controller` and `engine` jobs and submits the `ipyrad` job on Hydra using the GE. * When the `ipyrad` job completes it generate a stop file (`stop-ipcluster-profile=sge.now`) that is detected by the `start-stop-ipcluster.csh` script that then stops all the running `ipcluster` processes in the queue and on the login node. * So if all goes well, everything stops on it own _cleanly_. **If you wish to kill a running ipyrad job** for whatever reason you _must_ do so with the following two steps: * (1) killing with `qdel` the `ipyrad` job, and, * (2) creating the `stop-ipcluster-profile=sge.now` file in your working directory, as follows: `$ echo kill >stop-ipcluster-profile=sge.now` ## Before Submitting any Subsequent ipyrad Jobs 1. Please check that you have no related ipcluster process running on the login node. `$ top -u <your_user_name>` Each user should only have `sshd` and `bash` processes running by default. You will also see an entry for the command you just ran `top`, i.e. something like this: ``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11010 gonzalez 20 0 16096 2308 848 R 3.7 0.0 0:00.04 top 82511 gonzalez 20 0 101m 1984 956 S 0.0 0.0 0:00.05 sshd 82512 gonzalez 20 0 106m 2004 1468 S 0.0 0.0 0:00.08 bash ``` * You exit `top` by hitting the `q` key. * If you have other processes running, kill them with `kill -9 <PID>`, where <PID> the is the process id listed by top. 2. Check the queue for `controller` or `engine` jobs, with `qstat` or `q+`. 3. **CAVEATS**: Each user can only have one `ipyrad` instance running at the same time. ## Appendix: Changes made to config files to make them suitable for Hydra 1. Edit `ipcluster_config.py` ```python #line 127 c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher' #line 178 c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher' #line 187 c.IPClusterStart.delay = 60.0 #line 388 c.BatchSystemLauncher.queue = u'mThC.q' ``` 1. Edit `ipcontroller_config.py` ```python #Line 239 c.RegistrationFactory.ip = u'*' #Line 259 c.HubFactory.client_ip = u'*' ``` 1. Edit `ipengine_config.py` ```python #Line 377 c.RegistrationFactory.ip = u'*' ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.