SBND Coding Notes

This is a note for running the codes through SBND machine or submit the grib job on S3DF OnDemand. To do analysis, click here: [**SBND Analysis Notes**](https://hackmd.io/hjfCIKJuT--wezAzj-jRkg?view) # SBND Coding Notes [TOC] # Before getting started * Ensure `sbnd_login.sh` script is there (`/exp/sbnd/app/users/castalyf/mybashscripts/sbnd_login.sh`) ``` #!/bin/bash echo "********** runnning initial setup *************" source /cvmfs/sbnd.opensciencegrid.org/products/sbnd/setup_sbnd.sh echo "******** setting up sbndcode ************* " setup sbndcode v10_06_03 -q e26:prof echo "************ navigating to the top dierctory **********" cd /exp/sbnd/app/users/castalyf/my_larsoft_v10_06/ echo "*********** sourcing local products **************" source localProducts*/setup #mrbslp echo "************* setting up envireoment **************" #mrbsetenv echo "********** You are set (May be !!!!!!!!!! ) *************" echo echo echo "&&&&&&&&&&&&&&&&&&&&&&& REMINDEER (Date & Time) &&&&&&&&&&&&&&&&&&&&&" date echo echo echo "\$7 trillion added to your bank account, courtesy of Henry Lay" cd .. ``` * Ensure `/nashome/c/castalyf/.bashrc` is there and has the necessary alias, e.g. ``` #Set up dirs export DATA_DIR=/exp/sbnd/data/users/castalyf export APP_DIR=/exp/sbnd/app/users/castalyf #Add path for bash scripts export PATH=$PATH:$APP_DIR/mybashscripts #Useful aliases alias app="cd $APP_DIR" alias data="cd $DATA_DIR" alias mlreco_login="source mlreco_login.sh" alias sbnd_login="source sbnd_login.sh" alias setup_container="sh /exp/$(id -ng)/data/users/vito/podman/start_SL7dev.sh" alias setup_container_grid="sh /exp/$(id -ng)/data/users/vito/podman/start_SL7dev_jsl.sh" ``` * Then you'll be able to initialize the setup by hitting `sbnd_login` * Note: For grid job you have set up with`sh /exp/$(id -ng)/data/users/vito/podman/start_SL7dev_jsl.sh` (i.e. `setup_container_grid`) # Run the script in VS Code: To setup `ssh` from scratch, see the "Set up remote connection" subsection below. ## Enter the SBND machine (remote connection) 1. Login to Fermilab portal: Open terminal window in VS code, and type `kinit <username>@FNAL.GOV` (capital after '@'). 2. `shift`+`command`+`P` to make remote connection (Remote ssh: connect to host...). 3. Select `sbnd_gpvm` or `sbnd_build` * *gpvm* = general purpose virtual machine: run macros, write codes, analyze data, etc.; * *build*: build codes (`mrb i -j15`) or run larsoft (`lar -c …`) sometimes. 4. Hit `sbnd_login` on Terminal's window. Continue to "Run the script (macro)" section to run your codes. ### Logging in from your local terminal *We can also do the same thing on Terminal:* 1. To build the codes, hit `ssh sbnd_build_1` in command lines. 2. Hit `. .bashrc` *(this is the only different step from VS code)*. 3. As before, `sbnd_login`. ### Set up remote connection *This is only for the very beginning setup.* To enable we can enter the SBND machine by doing `ssh`, we need to: 1. Open the `ssh` script from your local machine: `nano ~/.ssh/config` 2. Add the following lines (taking `gpvm` as an example): ``` Host sbnd_gpvm_1 HostName sbndgpvm02.fnal.gov User castalyf ForwardAgent yes ForwardX11 yes ForwardX11Trusted yes GSSAPIAuthentication yes GSSAPIDelegateCredentials yes ``` We currently have `sbnd_gpvm_1`,..., `sbnd_gpvm_4`. 3. Then we should be able to connect remotely using `shift+command+P` (or `ssh` from local terminal). If one doesn't work, try another `gpvm`/`build`. ## Set up `larsoft` env * See the note here: https://shortbaseline.slack.com/docs/T7P7C3UAK/F09BQA0V6DU # Every-Day Setup 1. Start a fresh terminal followed by `kinit {username}`. 2. Start the container (we need this all the time!) by `setup_container` * Assumed you've already had a bash script. If not, simply run this command: `sh /exp/$(id -ng)/data/users/vito/podman/start_SL7dev_jsl.sh` 3. Navigate to your working directory (e.g.:`cd /exp/sbnd/app/users/castalyf/my_larsoft_v10_06/srcs/sbndcode/sbndcode/Overlays`) 4. Set up SBND environment by `sbnd_login` (you've had the bash script from [this section](https://hackmd.io/xjAl8h0vQiCk2QOJFqPZgQ#Before-getting-started).) 5. Get started! ### Set up `sbndcode` in SL7 Assume we would like to set up the latest version of the codes. 1. `setup_container` 2. `sbnd_login` 3. `ups list -aK+ sbndcode` (again, optional unless you're looking for the latest version) 4. `setup sbndcode v10_06_00_02 -q e26:prof` ### Set up ROOT in SL7 Assume you've already had `alias setup_container="sh /exp/$(id -ng)/data/users/vito/podman/start_SL7dev.sh"` in the `nashome/c/castalyf/.bashrc` 1. Set up SL7 container by `setup_container` 2. Set up SBND, that is, `sbnd_login` 3. Use UPS (Unix Product Support) to find ROOT: `ups list -aK+ root` 4. Set up ROOT with version e.g. `setup root v6_28_12 -q e26:p3915:prof` ### Run the script with ROOT (macro) 5. Navigate to the correct path (e.g. `/exp/sbnd/app/users/castalyf/tutorial/data`). 6. Run the file: `root [name].C` and this brings to ROOT interface. 7. In ROOT window, hit `.L [name].C+` (make sure it is 'C+' at the end). 8. Next, call `[name] f`. 9. Finally, type `f.Loop();` then there should display something. 10. To exit ROOT, type `.q`. # Compile the `sbndcode` The `sbndcode` should be located under `<your_larsoft_dir>/srcs/`. Here's also where you build the codes. Every time if we made changes in the `sbndcode` repository, we need to recompile the codes. To do this, be sure to run the following commands (assume you finished setup from the five steps above): > 0. *(Optional)* `cd <path_to_your_larsoft_area>` and then `source localProducts*/setup` to set up the default area you would like to run MRB. > 0. *(Optional)* `mrb zd` to clean your build directory > 1. `mrbsetenv` > 2. `mrb i -j 4` (the main process for compiling, which takes longer) > 3. `mrbslp` (don't forget to run this) Then you can proceed the `lar` commands, etc. * Be sure to run `source localProducts*/setup` under your `/larsoft` area if you encounter an error regarding the missing `fcl`, and then you should be able to run with `lar` commands. --- # Generate samples - *WHY?* - In order to simulate particles and feed them into ML training chain to see how well the training can be achieved (and then for analysis), we have to create a bunch of "particles". - *HOW?* - Generate particles based on many simulation steps. In this note we don't have to dig into details, we will try to run them by hand, but just bear in mind that a lot of simulation steps were running in background. ![image](https://hackmd.io/_uploads/BJL3ZyeKR.png) Left: ICARUS; right: SBND. To put all steps simply: 1. **Generator**: Create particle objects. 2. **GEANT4 (G4)**: Make these "particles" physically behavior like "particles" via a series of MC simulation steps. * Note: GENIE / MPVMPR generates the *primary particles*, and GEANT4 tracks their *interactions* in the detector. 3. **Detsim (Detector simulation)**: Make particles act as what they should act in the detector (e.g. $\nu$-Ar interacitons in LArTPC, etc.) 4. **Reco1**: Particles are simulated to be recorded through signal processing and then the space points can be reconstructed. 5. **Supera**: Typically at the same time when we run Reco1, we also have supera to to **labels making** before things being fed into ML chain. After all these steps, we have a set of samples which behavior like "real particles" inside detector. We also have them labelled (via Supera), and then we can feed them into the [ML Reco chain (SPINE)](https://hackmd.io/@castalyfan1012/SkNBTv8p6)! What will be happened next? * The samples will be trained via **[SPINE](https://github.com/DeepLearnPhysics/spine/tree/develop)** (i.e. ML reco chain). * We can do [physical analysis](https://github.com/DeepLearnPhysics/spine_workshop_2024/tree/main) after that, and for sure, using **testing samples**. * Note that people are going to make **CAF** (Common Analysis File) after training stage. For ICARUS, these can help to do analysis between SPINE and Pandora. ## Make a new local build *For example, we are going to generate MPVMPR samples, here are some steps...* *If you need to install the new version of `sbndcode`, start from here:* 1. Navigate to the directory (under `/exp/sbnd/app/...`), run: `source /cvmfs/sbnd.opensciencegrid.org/products/sbnd/setup_sbnd.sh` 2. Create a directory: `mkdir <my_directory>` and then `cd <my_directory>` 3. Make a local build for the most recent code (e.g. v09_89_01): `setup sbndcode v09_89_01 -q e26:prof` ### Build the codes 4. Install and develop larsoft software: (1) `mrb newDev` (2) `source localProducts*/setup` (3) `cd srcs/` (4) `mrb g -t v09_89_01 sbndcode` (e.g. v09_89_01 in this case) (5) `mrb g -t v09_89_01 sbncode` 5. *If you are just going to build the codes, start from here (every time modify the codes under `/srcs` will need to build codes from these steps):* (1) `cd $MRB_BUILDDIR` (2) `mrbsetenv` (3) `mrb i -j15` (this will build `<output>.root`) (4) `cd ../` (5) `mrbslp` (= multi repository build: set local products) ## Run the codes (simulation) 6. Update the `bash` script to the correct directory and see if the version is up-to-date when hitting `sbnd_login`. *(Note: this step is required **only** when we have made a new local build, that is, steps 1 - 4.)* 7. Navigate to `/exp/sbnd/data/...` (your larsoft dirctory), do `source localProducts*/setup` and hit `mrbslp` 8. Go to the work area wherever you want to save the outputs. Then run the following: ``` run_mpvmpr_sbnd.fcl g4_sce_lite.fcl detsim_sce_lite.fcl reco1_sce_mpvmpr_lite.fcl ``` i.e. ``` lar -c run_mpvmpr_sbnd.fcl -n 10 lar -c g4_sce_lite.fcl -s <output root file from the previous one> lar -c detsim_sce_lite.fcl -s <output root file from the previous one> lar -c reco1_sce_mpvmpr_lite.fcl -s <output root from the previous one> ``` * **NOTE (updated Jun 2025)**: The formal FHiCL files for SBND are listed here https://github.com/SBNSoftware/sbndcode/blob/develop/sbndcode/JobConfigurations/README.md Once we have `reco1`'s root file at hand, we may run one more step here for label making: ``` lar -c run_supera_sbnd_mpvmpr.fcl -s <output root from the previous one> ``` * **NOTE**: Reco 1 stage outputs LArCV files (`.root` extension). We typically move these files to S3DF for SPINE processing (then outputs HDF5). ### Example: Run SBND overlay modules Follow step by step: ``` lar -c prodgenie_nu_spill_tpc_overlay_sbnd.fcl -s /path/to/decoded/file.root -n 1 lar -c standard_g4_sbnd.fcl -s /path/to/previous/output.root lar -c standard_detsim_overlay_sbnd.fcl -s /path/to/previous/output.root lar -c overlay_waveforms.fcl -s /path/to/previous/output.root ``` * To check the output log: ``` lar -c eventdump.fcl -s /path/to/previous/output.root ``` * To produce an analyzable ROOT file (`overlay_ana_sbnd.root`): ``` lar -c run_overlayana_sbnd.fcl -s /path/to/previous/output.root ``` * **Debugger**: If there's an error for the missing `pds_calibration.db`, try to run this before running `run_overlayana_sbnd.fcl` command: `export FW_SEARCH_PATH=/pnfs/sbnd/resilient/users/mguzzo/db_files/:$FW_SEARCH_PATH`  ### Debug * Check the MRB version: `echo $MRB_PROJECT_VERSION` * For `detsim` issue: `fhicl-dump detsim_sce_lite.fcl &> fcldump_detsim.log` * Check the labels: `lar -c eventdump.fcl -s <prod_detsim...root> -n 1 &> eventdump_detsim.log` * Reference: https://sbnsoftware.github.io/sbndcode_wiki/commissioning/SBND_Commissioning_Get_Started.html --- # Event Display The event display is viewed by TITUS for SBND: 1. Enter SL7 container (`setup_container`) on a fresh terminal. 2. Set up TITUS environment, `source /exp/sbnd/app/users/sbnd/static_evd/setup.sh` 3. Enter the event display, `evd.py -s <path/to/art_root/file>` (wait until Xquartz interafce pops up) * It can alternatively be opened by `evd.py -s`, and then open the file from the window tab. --- # Access Data/MC samples All samples are followed by the SAM definitions: {%preview https://sbnsoftware.github.io/sbn/sbnprod_wiki/sample.html %} The `samweb` command can be run when you're done with `setup_container` and `sbnd_login`. Some common commands: * To list all files in the dataset: `samweb -e sbnd list-definition-files {definition}` * To find the location of one file: `samweb locate-file {root_filename}` * To find the parent/children stage directory of one file: * Find parent: `samweb file-lineage parents {root_filename}` * Find children: `samweb file-lineage children {root_filename}` ## Access samples for analysis Here we take an example of the hit efficiency study. For doing large-scale analysis with the calibration ntuples, we typically use `xrootd` file access. It is suggestive to have a bash script e.g. `get_xrootd.sh`: ``` while read fname; do furl=$(samweb get-file-access-url --schema root $fname) echo ${furl} done < filelist.txt # change to the input file name ``` Then, 1. List SAM files and save as a list: `samweb list-definition-files "{definition}" > filelist.txt` 2. Convert the files into `xrootd` format: `./get_xrootd.sh > filelist_xrootd.txt` (might take a while) ## Decode files Note sometimes we access the raw files, and they may need to be decoded first before analysis: ``` lar -c run_decoders_job.fcl -s {file_path} -n {number_events} ``` For example here the `{file_path}` is the path to the ROOT file when you hit `samweb locate-file` command. Then you need to specify the specific `.root` at the end. * For PDS-related analysis: After decoding, we need to run one more step to access the opt waveforms info: ``` lar -c /exp/sbnd/data/users/acastill/WaveformCalibration/run_wvfmana_undeco.fcl -s {decoded_file} ``` ## Example: Hit efficiency study (from `Ntuple`) Log in to SL7 container and set up SBND, then: 1. Navigate to the work area, e.g. `cd /exp/sbnd/app/users/castalyf/hiteff_2512` * You should at least have these files for analysis: * For setting up: `source setup_simple.sh` * To convert samples into `xrootd`: `get_xrootd.sh` * To visualize Ntuple dataframe: `analyze_ntuple.C` * To identify dead channels: `dead_wires.C` * Main analyzer script: `hit_analyzer.C`. The hit effiency and pitch info will be filled in two csvs (`hiteff_data.csv` and `hiteff_mc.csv`) * Main analyzer script for split TPC regions: `hit_split_regions_data.C` for data and `hit_split_regions_mc.C` for MC * To make plots: `hit_plotter.C` and `plot_split_regions.C` * To identify specific events: `event_info_viewer.C` 2. Set up environment, `source setup_simple.sh` 3. (Optional) but it's good to check the `ntuple` structure before analysis: `root analyze_ntuple.C` (make sure the file path is the right one) 4. Identify the dead channels(wires): `root dead_wires.C`, this outputs the `hit_wires.root` and `dead_channels.csv` 5. Analyze hit efficiency and save the info into the CSV: `root hit_analyzer.C` (now for both MC and data) * This takes a while, so it is recommended to run this command from a terminal window instead: `nohup root -l -b -q 'hit_analyzer.C' > log.txt 2>&1 &` 7. Make plots for hit efficiecny vs avg pitches: `root hit_plotter.C` 8. To visualize the results within different TPC regions: similar to steps 5-6, but with `hit_split_regions_data.C` (for MC: `hit_split_regions_mc.C`) and then `plot_split_regions.C` 9. If we are going to identify specific events/wires/timestamps, we can run the following script: - `root` - `.L event_info_viewer.C` - `show_event_info("{root_file_path}")` * **Note**: Above example typically compare data vs MC when plotting, so I usually use seprate terminal to run data and MC independently, and then make plots for both together. --- # LED Timing Calibration ## Get files prepared 0. Regular setup: `setup_container` --> `sbnd_login` --> `cd {path_to_/larsoft}` --> `source localProducts*/setup` --> `mrbslp` --> `cd /exp/sbnd/data/users/castalyf/PMT_timing/LED_data` 1. Decode the raw file (`/pnfs/.../data_EventBuilder....root`): ``` lar -c run_decoders_job.fcl -s {raw_file} ``` The main output file to be used is `data_EventBuilder_..._decoded-filtered.root` 2. *[optional]* To retrieve the waveforms information: ``` lar -c /exp/sbnd/data/users/acastill/WaveformCalibration/run_wvfmana_undeco.fcl -s {decoded_file} ``` This will output `wvfm_ana_undeco.root`, be sure to rename properly. 3. *[optional]* To get channels info based on the potential PMT responses, the notebook `PMT_waveform_test.ipynb` should be helpful. ## Analysis Main steps for analysis: 1. Navigate to the direcory, e.g. `/exp/sbnd/data/users/castalyf/PMT_timing` 2. For the decoded file, we first need to make makes a TTree with * Run #: Data collection session ID * Event # (int): LED flash number * Channel # (int): PMT ID (6-305) * Start tick of waveform (int) * Waveform vector (vector of ints): 100 values, raw ADC samples e.g. [14000, 14000, ..., 12000, ..., 14000] * Start tick of derivative (int) * Derivative vector (vector of floats): 80 values, normalized waveform e.g. [1.0, 1.0, ..., 0.3, ..., 1.0] All these can be done by running: ``` root 'AnalyzerMakinTree.C("{decoded_root}")' ``` This will create the initial ROOT tree with waveform data. Note the file name should be something like `data_EventBuilder..._decoded-filtered.root`. 3. The previous step will generate a output file, `outfile_ana-..._decoded-filtered.root`, along with the waveform/pulse plots (under `plots-..._decoded-filtered` folder). We now have to enter the script of `PMTWaveformTree.h` and replace the filename with the output one. 4. The main analysis script is `PMTWaveformAnalyzer.C`, it analyzes timing and creates plots by doing the following things: * Reads PMT waveform data from a ROOT TTree * Finds signal timing using derivative half-minimum method * Applies per-channel time delay corrections * Groups hits by run and event, computes per-event average PMT time and PMT time delays relative to event average * Computes mean and RMS delays per PMT and applies truncated statistics (±1 RMS) * Fits Gaussian to per-PMT delay distributions * Maps PMT timing statistics to detector geometry; separates PMTs by x>0 (West) and x<0 (East) * Computes PMT radius from detector origin, and fits delay vs radius trends * Generates summary tables printed to stdout To run the script, simply open a `root` session and then: ``` .L PMTWaveformAnalyzer.C PMTWaveformTree t; t.Loop(); ``` And the outputs include: * ROOT file: output_waveforms_combined_LED1.root * Per-channel absolute time histograms * Per-channel relative-time histograms * Average event time histogram * 2D heatmaps of mean PMT delays (East/West) * 2D heatmaps of RMS PMT delays (East/West) * 2D heatmaps of Gaussian mean and sigma * PMT delay vs radius TGraphErrors (East/West) * Combined radius vs delay plot PNG plots saved in `Plots/` folder. --- # SBND Grid Job (e.g. simulation files) * *WHY?* - Running sample stuff is time consuming, so we want to run those things via a better machine i.e. SBND's supercomputer. 1. Open a fresh Terminal (outside of VS code), log in to `sbnd_build` (e.g. `ssh sbndbuild03.fnal.gov`) 2. Run `. .bashrc` 3. Run `sh /exp/sbnd/data/users/vito/podman/start_SL7dev_jsl.sh` * Note: If getting bug at `reco` stage, use this container instead - `sh /exp/sbnd/data/users/vito/podman/start_SL7dev_jsl_test.sh` 4. Next, `source /cvmfs/sbnd.opensciencegrid.org/products/sbnd/setup_sbnd.sh` 5. Setup SBND code: `setup sbndcode v10_06_00_02 -q e26:prof` 6. Go to grid folder (e.g. `/exp/sbnd/app/users/castalyf/grid_jobs`), make sure the `xml` is there, and the lines inside point to our own directory. 7. To submit the job, run ``` project.py --xml <YOUR_XML> --stage <YOUR_STAGE> --submit ``` * For example, the first stage is `gen` so it will be: ``` project.py --xml intime_v2.xml --stage gen --submit ``` * It might ask you to complete authentication through the link. If it didn't pop out promtly, manually type `htgettoken -a htvaultprod.fnal.gov -i sbnd` and complete the authentication before running `project.py` command. ## Check grid job status: * https://fifemon.fnal.gov/monitor/d/000000004/experiment-overview?var-experiment=sbnd&orgId=1&var-pool=dune-global&var-pool=fifebatch * Check by user: https://fifemon.fnal.gov/monitor/d/000000116/user-batch-details?orgId=1&var-cluster=fifebatch&var-user=castalyf&from=now-30m&to=now * Check the log by job ID: https://fifemon.fnal.gov/monitor/d/JL6pUwB4k/submission-summary?var-cluster=24854881&var-schedd=jobsub05.fnal.gov&from=1763602594538&to=1763604394538&orgId=1 ## Merge ROOT files To merge multiple output samples into one single file, we can: ``` hadd <combined_file_name>.root /path/to/files/.root ``` It might take a while, depending on the sizes of all files. # Move the files to S3DF * To move the file from VS code to S3DF for data analysis, do: ``` rsync -a <root output> castalyf@s3dflogin.slac.stanford.edu:/path/to/data ``` * To move to the `data` area (i.e. where we need to `ssh neutrino`): Replace `s3dflogin` with `s3dfdtn`. For example: ``` rsync -a /pnfs/sbnd/scratch/users/castalyf/nue_v10_04_07/Apr2025_10k/reco1/*_*/larcv*.root castalyf@s3dfdtn.slac.stanford.edu:/sdf/data/neutrino/castalyf/larcv_2504/nue_v2 ``` --- # Run analysis with CAF We expect to have a "comman analysis file" within SBN program for people to do analysis, in which case the analysis machinery and even the formatting of the plots will be consistent in all revelant studies. The bash scripts for reference can be found here (only needed when building from scartch): * To build up: `/exp/sbnd/app/users/castalyf/spine_caf/create_dev.sh` * To run: `/exp/sbnd/app/users/castalyf/spine_caf/run_dev.sh` Here are the steps to run the analysis: 1. Open the terminal (VS code recommended), and set up the container: ``` sh /exp/$(id -ng)/data/users/vito/podman/start_SL7dev.sh ``` 2. Navigate to the working area, e.g. `/exp/sbnd/app/users/castalyf/spine_caf` * For the first-time build up: `source create_dev.sh` * After builing-up, some directories including `spine` should be created. 3. Set up environment: ``` source /exp/sbnd/app/users/castalyf/spine_caf/run_dev.sh ``` 6. Go to `spine/cafana/build/` - this is where we build the codes. The codes have to be build whenever we modified the module e.g. `example.cc` 7. [*Optional*] As long as we made changes in the `CMakeLists.txt`, we have to build the codes by the command ``` cmake .. ``` (we are in `/build` so we need to `cmake` the parent directory where the `txt` script is located). 9. To make the code executable: ``` make ``` (make sure we're in `spine/cafana/build/`) 11. To run the code: ``` ./<module> ``` e.g. `./example` if we did for `example.cc` ### Debugger * If got something like `Auth failed` error, try: `htgettoken -a htvaultprod.fnal.gov -i sbnd` to access token correctly. --- # Batch Job on S3DF (from `.root` to `.h5`) ## Batch job to create `.h5` files (e.g. with updated config) When you have the sample `.root`, to do physics analysis on S3DF (e.g. with SPINE), it is necessary to process them and make the `.h5` files. Sometimes, for example, when you make some changes in configuration (e.g. updating SBND post-processor's cfg), you will also have to process `.h5` with samples and then move on to the analysis steps. 1. Access the container where the data is located i.e. `ssh neutrino`. 2. Navigate to the directory where you have the `.cfg`, e.g. `/sdf/data/neutrino/castalyf/energy_calibration_202411/latest.cfg`. Update the config e.g. `nano latest.cfg` and do your changes. 3. Produce the list of paths to the files: ``` ls -1 <path_to_sample's_root_files> > <file_list.txt> ``` For example: ``` ls -1 /sdf/data/neutrino/sbnd/simulation/intime_v01/v09_89_01/larcv*.root > larcv_list.txt ``` 4. Source the environment: ``` source /sdf/data/neutrino/software/spine_prod/configure.sh ``` 5. Run the grid job: ``` bash <path_to_run.sh> --config <config_you_updated> --ntasks <number_of_processes> -t <estimated_time_to_run> <file_list.txt> ``` For example: ``` bash $MLPROD_BASEDIR/run.sh --config latest.cfg --ntasks 8 -t 4:00:00 larcv_list.txt ``` Or more specifically: ``` bash $MLPROD_BASEDIR/run.sh -n 1 --cpus-per-task 16 --files-per-task 200 --config latest.cfg --partition ampere --account neutrino:icarus-ml -t 57:00:00 larcv_list.txt ``` * Note: There are some partitions: `ampere`, `turing`, `roma`, etc. Basically `ampere` is the default one. You are done! ## An alternative way Alternatively, you can do these lines (Steps 4-5): ``` source /sdf/data/neutrino/castalyf/dqdx_study_202501/spine_prod_dev/configure_sbnd.sh export SPINE_BASEDIR=/sdf/data/neutrino/castalyf/dqdx_study_202501/spine_dev SPINE_CFG=/sdf/data/neutrino/castalyf/dqdx_study_202501/spine_prod_dev/gainstudy_250123.cfg bash $MLPROD_BASEDIR/run.sh -n 1 --cpus-per-task 16 --files-per-task 200 --config $SPINE_CFG --partiti on turing --account neutrino:icarus-ml -t 3:00:00 larcv_list.txt ``` A practical example (don't forget to include `bash` ahead there): ![Screenshot 2025-01-23 at 2.36.33 PM](https://hackmd.io/_uploads/BkewdXg_ke.png) ## Quick note updated 05/2025 ``` ssh neutrino cd /sdf/data/neutrino/castalyf/spine/prod/intime_v01 ls -1 /sdf/data/neutrino/sbnd/simulation/intime_v01/v09_89_01/larcv*.root > larcv_list.txt source /sdf/data/neutrino/software/spine_prod/configure.sh bash $MLPROD_BASEDIR/run.sh --config sbnd_full_chain_250328.cfg --ntasks 6 --files-per-task 827 --partition ampere --time 15:00:00 --flashmatch larcv_list.txt ``` ## Check job status Now the grid job has been submitted! To check the status: * To check the running job(s) and ID(s): `squeue -u <username>`, or alternatively: * `squeue | grep castalyf` * `squeue --user=castalyf` which displays more info. * To check the output/error messages: `less batch_logs/prod_spine_<id>.out` (Tips: hit `q` to quit; replace `less` with `tail` to view the most recent lines only.). * To check the sbash script: `less submit_prod_spine_<time_id>.sh` * To cancel the submitted job: `scancel <id>` ### Resources * The original instruction can be found here: https://github.com/DeepLearnPhysics/spine_prod/ * More about `slurm` commands: https://confluence.slac.stanford.edu/display/PCDS/Useful+SLURM+commands ---