# Virtual Data Submission Workshop ###### tags: `Spring 2022` `Brain Image Library` `workshop` ## Resources As a member of the [Brain Image Library](https://www.brainimagelibrary.org/) project you have access to * The virtual machine `workshop.bil.psc.edu` * The VM has 56 CPU and about 3Tb memory * 8 large-memory compute nodes that can be accessed using SLURM from within the virtual machine :::info :bulb: The VM `workshop.bil.psc.edu` will be generally online for use by members of the Brain Image Library. If a resource is unavailable or should be become unavailable for updates or upgrades, then you will receive a notification from the team. ::: ## Connecting to the `workshop` VM Open terminal and run the command ``` ssh <your-username>@workshop.bil.psc.edu ``` For example, ``` ssh icaoberg@workshop.bil.psc.edu icaoberg@workshop.bil.psc.edu's password: Last login: Mon Jan 24 10:46:38 2022 from pool-71-162-2-190.pitbpa.fios.verizon.net ********************************* W A R N I N G ******************************** You have connected to workshop.bil.psc.edu This computing resource is the property of the Pittsburgh Supercomputing Center. It is for authorized use only. By using this system, all users acknowledge notice of, and agree to comply with, PSC polices including the Resource Use Policy, available at http://www.psc.edu/index.php/policies. Unauthorized or improper use of this system may result in administrative disciplinary action, civil charges/criminal penalties, and/or other sanctions as set forth in PSC policies. By continuing to use this system you indicate your awareness of and consent to these terms and conditions of use. LOG OFF IMMEDIATELY if you do not agree to the conditions stated in this warning Please contact support@psc.edu with any comments/concerns. ********************************* W A R N I N G ******************************** ```` If you can see the message above when you connect, then you should be ready to start using the resources. ## LMOD <img src='https://i.imgur.com/TiNg8y8.png' width="25%" /> Lmod is a Lua based module system that easily handles the MODULEPATH Hierarchical problem. Environment Modules provide a convenient way to dynamically change the users’ environment through modulefiles. In a nutshell, we use LMOD to manage software that can be used in the VM as well as the large memory nodes. Software available as modules should be accessible on both resources. This document only lists a few commands. For complete documentation click [here](https://lmod.readthedocs.io/en/latest/010_user.html). :::info :bulb: If you want us to install a piece of software in our resources, then please remember to submit software installation requests to `bil-support@psc.edu`. ::: ### Listing available modules To list all available software modules use the command ``` module avail ``` For example ``` module avail -------------- /bil/modulefiles --------------- anaconda/3.2019.7 anaconda3/4.10.1 aspera/3.9.6(default) bcftools/1.9(default) bioformats/6.0.1 bioformats/6.1.1 bioformats/6.4.0 bioformats/6.5.1 bioformats/6.6.1(default) bioformats2raw/0.2.4(default) c-blosc/1.19.0(default) dust/0.5.4 ffmpeg/20210611 ``` The command above will list all available software. :::info :envelope: Cannot find the software you need to explore the collections? Then please send a request to `bil-support@psc.edu`. ::: ### Listing specific modules To list specific modules use the command ``` module avail <package-name> ``` For example, ``` module avail matlab -------------- /bil/modulefiles --------------- matlab/2019a matlab/2021a ``` ### Listing useful information To list useful info about a module use the command ``` module help <package-name> ``` For example, ``` module help matlab ----------- Module Specific Help for 'matlab/2021a' --------------- Matlab 2021a ------------ To enable, first load the following required modules (via module load command): module load matlab/2021a For a full list of binaries included in this module, type module what-is matlab/2021a ``` ### Loading modules To load a module use the command ``` module load <package-name> ``` For example, ``` module load matlab/2021a ``` Running the command above will make the matlab binary available in the current session ``` which matlab /bil/packages/matlab/R2021a/bin/matlab ``` In this example, you can simply type `matlab` to start the Matlab engine ``` → matlab -nodesktop MATLAB is selecting SOFTWARE OPENGL rendering. < M A T L A B (R) > Copyright 1984-2021 The MathWorks, Inc. R2021a Update 5 (9.10.0.1739362) 64-bit (glnxa64) August 9, 2021 To get started, type doc. For product information, visit www.mathworks.com. >> ``` #### Loading a specific version of a module There are times when there are multiple versions of the same same software. For example, ``` module avail bioformats ---------------------- /bil/modulefiles ---------------------- bioformats/6.0.1 bioformats/6.7.0 bioformats/6.1.1 bioformats/6.8.0(default) bioformats/6.4.0 bioformats2raw/0.2.4 bioformats/6.5.1 bioformats2raw/0.3.0(default) bioformats/6.6.1 ``` If you wish to load a specific version of a package use the command ``` module load <package>/<version> ``` For example, ``` module load bioformats/6.4.0 ``` ### Listing loaded modules To list the loaded modules use the command ``` module list ``` For example, ``` module list Currently Loaded Modulefiles: 1) matlab/2021a ``` ### Unload module To load a module use the command ``` module unload <package-name> ``` for example ``` module unload matlab/2021a ``` ### Using modules on scripts When building scripts that are using more than one tool available as modules, simply type the module command for each tool ``` #!/bin/bash module load matlab/2021a module load bioformats ``` ## SLURM ![](https://i.imgur.com/3gPmu1r.png) [Slurm](https://slurm.schedmd.com/documentation.html) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. This document only lists a few commands. For complete documentation click [here](https://slurm.schedmd.com/documentation.html). ### sinfo ``` sinfo - View information about Slurm nodes and partitions. SYNOPSIS sinfo [OPTIONS...] ``` For example ``` sinfo -p compute ``` ### squeue ``` squeue - view information about jobs located in the Slurm scheduling queue. SYNOPSIS squeue [OPTIONS...] ``` For example ``` squeue -u icaoberg JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 14243 compute script.s icaoberg R 15:34 1 l001 ``` ### scontrol ``` scontrol - view or modify Slurm configuration and state. SYNOPSIS scontrol [OPTIONS...] [COMMAND...] ``` As a regular user you can view information about the nodes and jobs but won't be able to modify them. The view information about the nodes use the command ``` scontrol show nodes ``` To view information about a specific node, use the node name to print this information. For example ``` scontrol show nodes l002 NodeName=l002 Arch=x86_64 CoresPerSocket=20 CPUAlloc=0 CPUTot=80 CPULoad=0.03 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=l002 NodeHostName=l002 Version=18.08 OS=Linux 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Tue Jun 29 21:55:12 UTC 2021 RealMemory=3000000 AllocMem=0 FreeMem=3090695 Sockets=4 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=compute BootTime=2021-07-16T15:47:48 SlurmdStartTime=2021-08-03T20:58:25 CfgTRES=cpu=80,mem=3000000M,billing=80 AllocTRES= CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s ``` Because there exists one partition, the you can run `sinfo` or `sinfo -p compute` to gather basic information about this partition. For example ``` → sinfo -p compute PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up infinite 8 idle l[001-008] ``` ### sbatch ``` sbatch - Submit a batch script to Slurm. SYNOPSIS sbatch [OPTIONS(0)...] [ : [OPTIONS(N)...]] script(0) [args(0)...] ``` ### srun ``` srun - Run parallel jobs SYNOPSIS srun [OPTIONS(0)... [executable(0) [args(0)...]]] [ : [OPTIONS(N)...]] executable(N) [args(N)...] ``` For example, to allocate an 8 hour debugging session you can type ``` srun -p compute --mem=16Gb --time=08:00:00 --pty /bin/bash ``` ##### interact The interact command is an in-house script for starting interactive sessions ``` → interact -h Usage: interact [OPTIONS] -d Turn on debugging information --debug --noconfig Do not process config files -gpu Allocate 1 gpu in the GPU-shared partition --gpu --gres=<list> Specifies a comma delimited list of generic consumable resources. e.g.: --gres=gpu:1 --mem=<MB> Real memory required per node in MegaBytes -N Nodes Number of nodes --nodes -n NTasks Number of tasks (spread over all Nodes) --ntasks-per-node=<ntasks> Number of tasks, 1 per core per node. -p Partition Partition/queue you would like to run on --partition -R Reservation Reservation you would like to run on --reservation -t Time Set a limit on the total run time. Format include mins, mins:secs, hours:mins:secs. e.g. 1:30:00 --time -h Print this help message -? ``` * At the moment, there only exists one partition named `compute`, so running ``` interact ``` or ``` interact -p compute ``` * To specify the amount of memory use the option `--mem=<MB>`. For example `interact --mem=1Tb` * This is a shared partition, if you wish to get all the resources in a compute node, use the option `--nodes`. For example `interact -N 1`. Since this is a shared resource, please be considerate using this resource. ### scancel ``` scancel - Used to signal jobs or job steps that are under the control of Slurm. SYNOPSIS scancel [OPTIONS...] [job_id[_array_id][.step_id]] [job_id[_array_id][.step_id]...] ``` There is no need to * To cancel a specific job use the command `scancel <job_id>`. For example `scancel 00001` * To cancel all your running jobs use the command `scancel -u <username>`. For example `scancel -u icaoberg`. ## Exercises ### Exercise 1. Trying the Napari BIL Data Viewer <img src="https://imgur.com/gkDCsMd.gif" /> This plugin enables viewing of datasets archived in the [Brain Image Library](https://www.brainimagelibrary.org/). :::warning :warning: This plugin is under early development. Currently, only a subset of single color, fMOST datasets which include projections are available to view. ::: Though there are several ways to deploy Napari in your laptop/desktop, at the moment, we recommend you install locally using Anaconda. #### Installing Anaconda <img src="https://upload.wikimedia.org/wikipedia/en/c/cd/Anaconda_Logo.png" width="50%"/><br> #### Installing Napari Installing Napari is easier using [conda-forge](https://conda-forge.org/). To install Napari, run the command ``` conda install -c conda-forge napari ``` ![](https://i.imgur.com/7t1cUU8.png) If the command above fails or Napari fails to start, you can also try ``` conda create -y -n napari-env -c conda-forge python=3.8 conda activate napari-env pip install 'napari[all]' ``` Follow the [official documentation](https://docs.anaconda.com/anaconda/install/index.html) to install Anaconda or Miniconda on your local system. #### Installing the napari-bil-data-viewer To install the plugin, we need to start napari first. To start Napari, run the command ``` napari ``` This will open the Napari window ![](https://i.imgur.com/nmOQXMn.png) I am running Napari in MacOSX, but if you are running Windows or Linux you should see a similar menu. Clicking `Install/Uninstall Plugins...` should open the window below ![](https://i.imgur.com/FWtJ1hO.png) Search for `napari-bil-data-viewer` and click `Install` ![](https://i.imgur.com/5cY5xQN.png) If installed properly, the plugin will be listed as `Installed` ![](https://i.imgur.com/lwUdCf5.png) Close the window and go back to the main window. :::info :pencil: If you are familiar with terminal, you can run the following commands instead of following the steps above ``` conda create -y -n bil-viewer python=3.8 conda activate bil-viewer # Install napari-bil-data-viewer pip install napari-bil-data-viewer ``` ::: #### Using the plugin On the top menu select ![](https://i.imgur.com/TQq7QXQ.png) This should a panel to the right ![](https://i.imgur.com/mu0Ul8R.png) Use the drop down list to choose the images you want to explore with Napari ![](https://i.imgur.com/O6zUWd0.png) If you are curious about the status of your session in Napari, then you can monitor the logs in the terminal you opened the application in ![](https://i.imgur.com/BGTFd0D.png) :::warning :warning: This plugin is under early development. Currently, only a subset of single color, fMOST datasets which include projections are available to view. ::: ### Exercise 2. Contrast-stretching with ImageMagick This exercise is trying to tie up together all the concepts discussed in this workshop. Imagine we are interested in collection `84c11fe5e4550ca0` that I found in the portal ![](https://i.imgur.com/pTX96xE.jpg) :::info :bulb: There is no need to download the data locally because the data is available when you our resources. ::: ![](https://i.imgur.com/TMhoZTw.png) *I can navigate to `/bil/data/84/c1/84c11fe5e4550ca0/` to see the contents of the collection.* Unbfortuntaly it is difficult to visually inspect the images because these are not contrast stretched. ![](https://i.imgur.com/WzYR67M.png) *The images are not contrast stretched and cannot be visually inspected.* Fortunately there are tools like Fiji that can contrast stretch the images. However I want to do this in batch mode as a job since this process can be automated. ![](https://i.imgur.com/Z6AcrjZ.jpg) [ImageMagick](https://imagemagick.org/index.php) is a robust library for image manipulation. The `convert` tool in this library has an option to [contrast-stretching](https://imagemagick.org/script/command-line-options.php#contrast-stretch). The format is ``` convert <input-file> -contrast-stretch <output-file> ``` Next I will create a file called `script.sh` and will place it in a folder in my Desktop. ``` #!/bin/bash #this line is needed to be able to use modules on the compute nodes source /etc/profile.d/modules.sh #this command loads the ImageMagick library module load ImageMagick/7.1.0-2 #this for loop finds all the images in the sample folder and contrast-stretch for FILE in /bil/data/84/c1/84c11fe5e4550ca0/SW170711-04A/*tif do convert $FILE -contrast-stretch 15% $(basename $FILE) done ``` :::info :bulb: For simplicity, you can find the script above in ``` /bil/workshops/2022/data_submission ``` to copy the script to your Desktop run the command in terminal ``` cp /bil/workshops/2022/data_submission/script.sh ~/Desktop/ ``` ::: Next I can submit my script using the command ``` sbatch -p compute --mem=64Gb script.sh ``` Since I am doing serially I don't need much memory but if I were to do this in parallel I might. To monitor your job progress use the command `squeue -u <username>`. For example, ``` squeue -u icaoberg JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 14243 compute script.s icaoberg R 15:34 1 l001` ``` This leads to ![](https://i.imgur.com/uHB0Vr6.png) --- The Brain Image Library is supported by the National Institutes of Mental Health of the National Institutes of Health under award number R24-MH-114793. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.