Theta ADSP Allocation

# Theta ADSP Allocation ###### tags: `general` `theta` `adsp` `allocation` `quota` ## Quotas Storage directory (Lustre): `/grand/neutrino_osc_ADSP/`. To list disk quota: ```bash=0 myquota Name Type Filesystem GB_Used GB_Quota Grace =============================================================================================== deltutto User mira-home 0.00 100.00 none ``` ```bash=0 myprojectquotas Lustre : Current Project Quota information for projects you re a member of: Name Type Filesystem Used Quota Grace =============================================================================================== neutrino_osc_ADSP Project grand 4k 240T - ``` To list the project allocation quota: ```bash=0 sbank-list-allocations -p neutrino_osc_ADSP Allocation Suballocation Start End Resource Project Jobs Charged Available Balance ---------- ------------- ---------- ---------- -------- ----------------- ---- ------- ----------------- 7141 7009 2021-05-01 2022-05-01 thetagpu neutrino_osc_ADSP 0 0.0 2,000.0 7142 7010 2021-05-01 2022-05-01 theta neutrino_osc_ADSP 0 0.0 72,000.0 ``` # :memo: LArSoft Lustre is basically parallel IO, each file gets chunked and written to a bunch of different high performance drives. The idea is that when everyone is reading and writing, the actual work isn’t bottlenecked and everyone gets good performance. In reality, there is a metadata server that knows where the files actually live (file A chunked on drives 1, 2 ….50…). Sometimes that server gets overwhelmed, but `/grand` is a brand new system with more servers I think. The number of servers that files get written to is called the `stripe count`. ```bash=0 cd /grand/neutrino_osc_ADSP mkdir sbndcode chmod g+w sbndcode/ # Set number of severs files in sbndcode are written to lfs setstripe -c 40 sbndcode/ # Pull products ./pullProducts ./sbndcode/ slf7 sbnd-v09_24_02 e20 prof ``` ## Singularity Corey built a singularity image from a docker image made by [FNAL](https://hub.docker.com/r/fermilab/fnal-wn-sl7/): `singularity build fnal-wn-sl7.sing docker://fermilab/fnal-wn-sl7:latest`. Then we run (from `/grand/neutrino_osc_ADSP`): ```bash=0 singularity exec -B /grand fnal-wn-sl7.sing bash Singularity> source sbndcode/setup Singularity> setup sbndcode v09_24_02 -q e20:prof Singularity> Singularity> lar -c prodsingle_mu_bnblike.fcl -n 10 ``` ## Balsam - Get Started Balsam instructions: https://balsam.readthedocs.io/en/latest/index.html To test that we can run some simple test job, create a balsam site and submit a couple of test jobs. This is taken from the Balsam tutorial itself, with some modifications. ```bash=0 balsam site init balsam_testsite [Select the default configuration for Theta-KNL] cd balsam_testsite/ balsam site start ``` Balsam login doesn't work, but Misha can generate a token. I had to put the following block in `~/.balsam/client.yml`: ``` api_root: https://balsam-dev.alcf.anl.gov client_class: balsam.client.BasicAuthRequestsClient connect_timeout: 6.2 read_timeout: 120.0 retry_count: 3 token: eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOjE1LCJleHAiOjE2MjcyNDE3MDgsInVzZXJuYW1lIjoiZGVsdHV0dG8ifQ.3eyACjKc_6hwStfWLuZjyraNQDxSba4iB8g9QPaDU50 token_expiry: '2021-07-25T19:35:08.727143' username: deltutto ``` Then, need to uncomment this line in the `job-template.sh` script: ``` # export https_proxy=http://theta-proxy.tmi.alcf.anl.gov:3128 ``` This is needed because the jobs are living in that external database and we can interact with them on the login nodes since login nodes since they have outbound network access. But compute nodes don’t have this network access, and the server (which used to be internal to ALCF and is now public) is unreachable. This export tunnels outbound traffic through a proxy server to let compute nodes reach the bigger world. Now the Balsam site is running, and we can create a couple of test jobs: ```bash=0 balsam app create > Application Name (of the form MODULE.CLASS): test.Hello > Application Template [e.g. 'echo Hello {{ name }}!']: echo hello {{ name }} && sleep {{ sleeptime }} && echo goodbye balsam job create --app test.Hello --workdir test/1 --param name="world" --param sleeptime=2 balsam job create --app test.Hello --workdir test/2 --param name="balsam" --param sleeptime=1 balsam queue submit -q debug-cache-quad -A neutrino_osc_ADSP -n 1 -t 10 -j mpi ``` Can be monitored with: ```bash=0 watch "qstat -u $USER && balsam queue ls" ``` and can be seen here: https://status.alcf.anl.gov/theta/activity. The output is placed in the `data/` directory. ## LArSoft Test Job with Balsam Create a new app in the Balsam site that will run a prodsingle LArSoft job: ```bash=0 balsam app create > Application Name (of the form MODULE.CLASS): prodtest.ProdSingle > Application Template [e.g. 'echo Hello {{ name }}!']: to_be_written ``` Then open the file `apps/prodtest.py` and modify the `command_template` attribute to: ```python=0 command_template = ''' singularity run -B /lus/grand/projects/neutrino_osc_ADSP:/lus/grand/projects/neutrino_osc_ADSP:rw /lus/grand/projects/neutrino_osc_ADSP/fnal-wn-sl7.sing <<EOF #setup SBNDCODE: source /lus/grand/projects/neutrino_osc_ADSP/sbndcode/setup setup sbndcode v09_24_02 -q e20:prof lar -c prodsingle_mu_bnblike.fcl ls EOF echo "\nJob finished in: $(pwd)" echo "Available files:" ls ''' ``` Save it and run `balsam sync app` to sync the newly modified app. Create a new job: ```bash= balsam job create --app prodtest.ProdSingle --workdir prodtest_single/0 ``` And check that the job has been created with: ```bash= balsam job ls ``` Now let's run it: ```bash= balsam queue submit -q debug-cache-quad -A neutrino_osc_ADSP -n 1 -t 10 -j mpi watch "qstat -u $USER && balsam queue ls" ``` If it is successful, the output files will be in `data/prodtest_single/0`. ## LArSoft Production Workflow ### Introduction Up to now we’ve been running jobs in `mpi` mode, in which balsam runs on the `mom` node. This mode runs the balsam executable and launches a bunch of `aprun` calls (which is the Theta version of `mpirun`) and those go onto the compute nodes. In serial mode, it’s a bit inverted: balsam runs on each compute node (via `aprun`) and does POpen calls without `aprun` involved. So this is a direct fork of the process onto compute nodes. The advantage here is that we can use a portion of the compute node, per balsam-job, instead of granularity of one node or higher. With 64 cores, we have been running the larsoft jobs with 64 lar calls per node, and scaling out onto hundreds of nodes For this, we end up packing in a ton of larsoft jobs, launching a big balsam job, and they all run together. We want to have several stages, though, which requires a workflow. When `serial` mode (or `mpi` mode, too) is asking the database for jobs, it uses the list of jobs that are `PREPROCESSED`. Jobs can have parents and children too, which will prevent them from running right away Check out the diagram down the page here: https://balsam.readthedocs.io/en/master/userguide/app.html We can have a job (say, prodsingle) that is a parent to a geant job, which is a parent to a detsim job, etc. We can break societal norms too: jobs can have unlimited numbers of parents or children One optimization that was found helps is to do something like: - Job 1: `lar -c prodsingle.fcl -n 25` - Jobs 2-26: `lar -c g4.fcl -n 1 -nskip {JOB_INDEX} -i singles.root`, each has Job 1 as a parent - Job 27-52: `lar -c detsim.fcl -n 1 -i singles_g4_JOB_INDEX.root`, each has exactly 1 parent from the previous step - Job 53: `lar -c merge.fcl -i {all detsim outputs}`, this job has every job from 27 to 52 as it’s parent This gives a graph of jobs that balsam will take, and dynamically run every job with it’s parents finished and keep going until no more jobs You can also slice between workflows with `tags`, so one could be `sbnd:g4` and one could be `uboone:prodsingle` and one could be `sbnd:detsim` and you can select only `sbnd`, only `detsim`, only whatever. ### Creating a Workflow The following shows an example on how we can create a prodsingle job, followed by several geant4 jobs which have the prodsingle one as the parent. First, we need to make two apps, one for prodsingle, and one for Geant4. This is done similarly to the example above, but now we need to pass some arguments. For prodsingle, the `command_template` should look something like: ```python= command_template = ''' singularity run -B /lus/grand/projects/neutrino_osc_ADSP:/lus/grand/projects/neutrino_osc_ADSP:rw /lus/grand/projects/neutrino_osc_ADSP/fnal-wn-sl7.sing <<EOF #setup SBNDCODE: source /lus/grand/projects/neutrino_osc_ADSP/sbndcode/setup setup sbndcode v09_24_02 -q e20:prof lar -c prodsingle_mu_bnblike.fcl --nevts {{nevts}} --output {{output}} ls EOF echo "\nJob finished in: $(pwd)" echo "Available files:" ls ''' ``` While for Geant4: ```python= command_template = ''' singularity run -B /lus/grand/projects/neutrino_osc_ADSP:/lus/grand/projects/neutrino_osc_ADSP:rw /lus/grand/projects/neutrino_osc_ADSP/fnal-wn-sl7.sing <<EOF #setup SBNDCODE: source /lus/grand/projects/neutrino_osc_ADSP/sbndcode/setup setup sbndcode v09_24_02 -q e20:prof lar -c standard_g4_sbnd.fcl --nevts {{nevts}} --nskip {{nskip}} --output {{output}} --source {{source}} ls EOF echo "\nJob finished in: $(pwd)" echo "Available files:" ls ''' ``` This python script created the jobs: ```python= from balsam.api import Job, App, site_config n_events = 5 node_packing_count=5 # # ProdSingle # prodsingle_app = App.objects.get(site_id=site_config.site_id, class_path="production.ProdSingle") prodsingle_jobs = [] for i in range(1): job = Job( workdir=f"prod_prodsingle/{i}", app_id=prodsingle_app.id, tags={'sbnd':'prodsingle'}, parameters={ "nevts": f"{n_events}", "output": f"prodsingle_sbnd_{i}.root"}, node_packing_count=node_packing_count, num_nodes=1, parent_ids=(), ) print(job) job.save() prodsingle_jobs.append(job) # # Geant4 # geant4_app = App.objects.get(site_id=site_config.site_id, class_path="production.Geant4") geant4_jobs = [] for i in range(n_events): job = Job( workdir=f"prod_geant4/{i}", app_id=geant4_app.id, tags={'sbnd':'geant4'}, parameters={ "nevts": "1", "nskip": f"{i}", "output": f"prodsingle_geant4_sbnd_{i}.root", "source": f"{prodsingle_jobs[0].parameters['output']}"}, node_packing_count=node_packing_count, num_nodes=1, parent_ids={prodsingle_jobs[0].id}, ) print(job) job.save() geant4_jobs.append(job) ``` Running this should create the following: ```bash= $> balsam job ls ID Site App Workdir State Tags 67191 thetalogin6:balsam_testsite2 production.Geant4 prod_geant4/4 AWAITING_PARENTS {'sbnd': 'geant4'} 67187 thetalogin6:balsam_testsite2 production.Geant4 prod_geant4/0 AWAITING_PARENTS {'sbnd': 'geant4'} 67190 thetalogin6:balsam_testsite2 production.Geant4 prod_geant4/3 AWAITING_PARENTS {'sbnd': 'geant4'} 67189 thetalogin6:balsam_testsite2 production.Geant4 prod_geant4/2 AWAITING_PARENTS {'sbnd': 'geant4'} 67186 thetalogin6:balsam_testsite2 production.ProdSingle prod_prodsingle/0 STAGED_IN {'sbnd': 'prodsingle'} 67188 thetalogin6:balsam_testsite2 production.Geant4 prod_geant4/1 AWAITING_PARENTS {'sbnd': 'geant4'} ``` TO BE CONTINUED ## Flux Files ### SBND Fluxes SBND flux files in neutrino mode are in `/cvmfs/sbnd.osgstorage.org/pnfs/fnal.gov/usr/sbnd/persistent/stash/fluxFiles/bnb/BooNEtoGSimple/configH-v1/march2021/neutrinoMode/`. They have been copyied to Theta at this location: ```bash= /grand/neutrino_osc_ADSP/files/flux/sbnd/neutrino_mode/ ``` In total, they take 69 GB of disk space. ### ICARUS Fluxes ### MicroBooNE Fluxes # Table of Contents [ToC]