owned this note
owned this note
Published
Linked with GitHub
# Notes and Q/A for NorESM User Workshop
### Useful links:
- The NorESM2y.0 documentation page: https://noresm-docs.readthedocs.io/en/noresm2/
- The NorESM model code: https://github.com/NorESMhub/NorESM
- NorESM2.0 releases: https://noresm-docs.readthedocs.io/en/noresm2/access/releases_noresm20.html
- Low key discussion forum: https://github.com/NorESMhub/NorESM/discussions
If you want to write in this document, click on the pen in the upper-left corner :)
If you only want view mode, click on the eye
If you want both, click on the "two pages" in between
Q/A template:
* [You_Name_Initials]: your questions
- [You_Name_Initials]: the answers
-
Example:
* YH: where is the NorESM Github repository
- YH: https://github.com/NorESMhub/NorESM
-
## Day 4, Monday, 27th Nov.
* EB: Could someone maybe provide the zoomlink for todays session, as i did not get it per mail?
- [zoomlink](https://uib.zoom.us/j/69714106941?pwd=dy9JRTh0WmdJaDJTcWF4c0hoTlEyUT09)
* YX: Can you share the first slide in this morning?
- MD: Yes, they will be uploaded to the repo shortly.
- TT: I have included some of the presentations on the web page, but not the one Steve is presenting now.
https://noresmhub.github.io/NorESM_Workshop_2023/#summary-of-important-links
- TT: Will include Steve's presentation when he sends it to me.
* TT: Run script example on betzy:
/cluster/shared/noresm/WORKSHOP/scripts/ReproExperimentScriptSimple.sh
/cluster/shared/noresm/WORKSHOP/scripts/ReproExperimentScript.sh
## Day 3, Monday, 27th Nov.
* TT: Presentation on NorESM diagnostocs package:
https://github.com/NorESMhub/NorESM_Workshop_2023/blob/main/presentations/noresm-diagnostic-tool.pdf
https://nordicesmhub.github.io/noresmdiagnostics/06-postprocess/
https://github.com/NorESMhub/NorESM_Workshop_2023/blob/main/presentations/cmip-data.pdf
https://github.com/NorESMhub/NorESM_Workshop_2023/blob/main/presentations/noresm-diag-basics.pdf
* SG: Can Yanchun share his current presentation in the break?
https://github.com/NorESMhub/NorESM_Workshop_2023/blob/main/presentations/noresm-diag-basics.pdf
* AG: Archiving is documented here https://noresm-docs.readthedocs.io/en/noresm2/output/archive_output.html
* MK: Could you please share the slide during/after the workshop?
- MD: The whole presentation?
- MK: Yes, the current one, and the slide about the overview of NorESM which was shown on Day1 in order to review later.
- TT: The slides from day 1 and 2 are available from the meeting website, under links:
https://noresmhub.github.io/NorESM_Workshop_2023/#summary-of-important-links
- MK: Thank you very much for sharing!
* HC: I have a output from the 5-y run, which has no label of 'h': N1850frc2_f19_tn14_job1.cam.i.1602-01-01-00000.nc. Is this yearly mean or something else?
- AKG: Have you cheked data in folder /cluster/work/users/\$USER/archive it should be in archive folder can you provide yours username?
- HC: user name: cheh. Path of the output: /cluster/work/users/cheh/archive/N1850frc2_f19_tn14_job1/atm/hist
- TT: I can't see your work directory, /cluster/work/users/cheh is only open to read access for user
- TT: Sigma2 guidelines for sharing work directory: https://www.sigma2.no/now-sharing-files-becomes-easier
- HC: changed the permissoin with `chmod o+rx $USERWORK`
- AKG: There are also N1850frc2_f19_tn14_job1.cam.h0* files. You can check again.
- TT: about the original question, I think "i" should indicate instantaneous output.
- HC: Thanks!
* YY: Thanks for sharing the presentation. Is it possible have access to the presentations before/during the lectures, it helps to take a note :)
- YH: I will try to put this.
* MK: Do you use results from the diagnostic tool package in the paper? Or the tool can be used to check if the simulation is successful?
- TT: Normally you need to do further post-processing for publication-level output. The diagnostic tool is primarily used to check output performance. But we also use this output in some early presentations, or when discussing model output in internal meetings
- MK: Thanks for the reply.
- TT: Most people use python (xarray) or Matlab for further processing, but we don't have a common collection of these tools.
- MK: Ok, I will use this tool effectively.
- AG: running diag_run on your data will produce a lot of netcdf files with the results from the diagnostics package, these files you can easily plot using python - this will be much faster than to repeat the analysis yourself. But most of us have our own scripts for doing analysis, and only use diag_run for checking simulations and for presentation of resluts and such.
* RS: It says that the diagnostics package is not supported yet on Fram. Will it be in near future?
- AKG: sorry, No plan for support on FRAM also on Betzy we did it only for tutorial purpose. Reason for that, we can not store data on Fram/Betzy/Saga for more than 1 month.
- AG: We use diag_run on NIRD usually
- RS: Ok, I understand. Thank you!
* TT: Alternative NorESM output, if you don't have your own 5 year run:
/cluster/shared/noresm/NorESMWS2023/archive/
* YH: connecting to Betzy with port forwarding:
`ssh -L 8080:localhost:8080 <user-name>@betzy.sigma2.no`
Then change to the directory with diagnostic webpage output, then in the shell:
`python3 -m http.server 8080 --bind 127.0.0.1`
If one don't use the port foward to access Betzy file from local browser, one can always download the tarball file, e.g.,:
`/cluster/work/users/yanchun/diagnostics/out/CAM_DIAG/diag/NHISTfrc2_workshop2021`
* TT: In case of problems with module conflicts on Betzy, try to run `module purge` and try again.
* LP: I still have problems to allocate:
```
$ salloc --nodes=1 --mem-per-cpu=12G --time=00:30:00 --partition=preproc account=nn9039kk
salloc: error: Account specification required, but not provided
salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specified
```
- TT: I think you need `--account=nn9039k` insteaed of `account=nn9039k`
- LP: Thanks!
* AZ: Maybe I miss the information, but if we would like to use the preinstalled package and we don't have access to the NS2345K project (neither on Betzy or NIRD)?
- TT: On Betzy, it is not linked to the nn2345k, it is in the shared directory:
alias diag_run='/cluster/shared/noresm/diagnostics/noresm/bin/diag_run'
alias diag_srun='/cluster/shared/noresm/diagnostics/noresm/bin/diag_srun'
- AZ: Thanks! I just saw the path was at the beginning of the paragraph even though they kept using the NS2345K path :)
* MK: Should I create an output directory before running a command "diag_srun"? or it will be created automatically?
- TT: diag_run should create an output directory, but the root for the output directory probably need to be available before running diag_run.
- MK: Ok, thanks.
- RS: What do you mean by the root for the output directory need to be available?
- TT: I thought the directory -o <output/director> should be available before running diag_run, but it seems this is not the case after all.
* JC: If I run diag_run with passive-mode (-p), how can I check the progress, any log file?
* TT: I think the passive-mode should only produce configuration files, not diagnostic output, so I don't think there is a log file for this (nhttps://ns2345k.web.sigma2.no/people/yanchun/html/noresmdiagnostic.htmlot that I know of)
* LP: To set input data, output data on Betzy, and webpage path on NIRD, which path on NIRD should I use? Should I mount NIRD first in Betzy?
* TT: You will not be able to output to Nird if you run on Betzy. Also, if you run on compute nodes it is not possible to mount Nird (Nird is mounted on login nodes already). If you have access to a Nird account, it is better copy output data to Nird first, then run "diag_run" and output to the project "www" directory. As a work-around, you should be able to copy the tarball output from Betzy, either to your local computer or to Nird.
## Day 2, Tuesday, 21st Nov.
* [AZ]: what happens if I set 'branch' if it should be a 'hybrid' run? and viceversa?
* [AKG] In case of branch run, It would fail if data are not from same time point. But, here we provided you data from same time point so, it would work both for hybrid and branch. Usually data for hybrid cases are from different time point. We collect data mostly from individual stand-alone cases and using those data we create hybrid case.
* [OH] What is recommend for documenting namelist changes? Does namelist changes get documented like xmlchange changes do?
* [TT]: If you make namelist changes in user_nl_* files, theses will be copied with the model output during archiving.
In the archive ooutput, there is a 'case' directory where the entire case setup is copied.
[AG]: Also, if you make e.g. parameter changes, changing files etc in the user_nl_cam, these changes will appear in CaseDocs/atm_in. The atm_in file will also be in the run folder. And you will find the values/filenames in the atm.log file in the run folder and this file will be archived under Casename/logs/
* [HC]: What is the difference between /case.setup --reset and /case.build --clean? If I want to rebuild, which should I run?
* [AG]: case.setup --reset is used if you want to modify/change the number of nodes or processors you are running on. I.e. if you want to change the env_mach_pes.xml file. If you have already build, then you have to clean or reset the build as well. If you don't want to change the number or partitioning of processors but you want to make changes to your experiment setup which requires a rebuild (e.g. if you want to run with COSP and you forgot to activate that option
the first time you built) you only need to clean or reset your build and not the setup
* [EB] Maybe i missed it, but what is the case directory for the reference ace in the handsone session, for the branch run?
[YH] good if you can first start a 'picontrol' as start-up run, and then start a historical run as branch run, with the previous picontrol as the reference case.
[TT]: The reference case is "1850_f19_tn14_110629", the path is "/cluster/shared/noresm/WORKSHOP/"
* [NB]: What is the simplest way of checking if a run was successful?
* [TT]: In the case directory, check "CaseStatus". If run is successful it will say there. If the case is still running, it should be visible when you do "squeue -u $USER".
* [RS]: Do we need to change the RUN_REFDIR? I got this error when building ERROR: Could not download refcase from any server
* [TT]: If you access the restart files by reference, you need to set RUN_REFDIR. However, you should be able to copy the restart files into your run directory after doing case.setup
* MK: If the job was crashed, which step I should go back and how I can start again?
* [SG]: What was the crash?
* MK: Sorry. After I ran "./case.submit", the job stopped and the log file "CaseStatus" said "case.run error ERROR: RUN FAIL:...".
* [TT]: Is there a path to a log file as well? If so, can you find an error message there?
* CaseStatus is mostly just useful to see if something was successful or if it failed. You need to dig a bit deeper to find the actual error messages.
* There should be a log file in the run directory: `cesm.log.<jobid>`
* [SG]: Can you paste the run directory here?
* MK: Thank you for your help. I found that I did not change "GET_REFCASE" in env_run.xml from FALSE to TRUE. I think this is the reason why the job was crashed. So, I think I should change "GET_REFCASE". Can I change this parameter in env_run.xml, then run ./case.setup, ./case.build, and ./case.submit again?
* [SG]: Use xmlchange, it will tell you if you need to run case.setup
* [SG]: It is always safer to use xmlchange
* MK: Thank you for your help!!
* [SG]: I think you can change this and then case.submit but check the output of xmlchange to be sure
* [CH]: I changed RUN_REFDIR and set GET_REFCASE to true, but I also still got the error "Could not find all input data on any server". However, the restart files seem to be in my run directory. How can I go from here?
* [SG]: What is your RUN_REFDIR? Maybe some file is missing. On the other hand, if the files are in your directory, try turning GET_REFCASE to FALSE so it does not try to copy the files.
* [CH]: It's /cluster/shared/noresm/WORKSHOP/ . Thanks I'll try that.
- [MD]: if GET_REFCASE is true you might get something wrong
* [CH]: Changing GET_REFCASE seemed to work
* [EB]: Case with ice ridging error: /cluster/work/users/estherbe/NHistfrc2_f19_tn14_test02_20231121/run
* [SG]: I do not have permission to see inside that directory. Can you do:
* [SG]: chmod -R go+r /cluster/work/users/estherbe
* [SG]: chmod go+rx /cluster/work/users/estherbe/NHistfrc2_f19_tn14_test02_20231121/run /cluster/work/users/estherbe/NHistfrc2_f19_tn14_test02_20231121
* [RS]: Should DOUT_S_SAVE_INTERIM_RESTART_FILES be TRUE or FALSE in case B?
- [MD]: Try both :)
* [NB] I am trying task 3. However, the run fails. I checked the cesm.log file and if I am correct, the error is " ERROR: GETFIL: FAILED to get ./N1850_f19_tn14_11062019.cam.rs.1600-02-01-00000.nc". THe refdate is set to 1600-01-01.
- [SH]: I got this as well. I think it was because after the 1-month run is completed, it produces restart and rpointer files and the newly produced rpointers overwrite the initially copied rpointers which creates a mixup. Copying the initial restart files again to the rundir should fix it. (This issue can probobaly be avoided by providing the refdir and get_refcase=true insted of manually copying the restart files to rundir)
- [NB]: Thanks! Yes I copied the files again and it worked.
* [AS]: On exercise C, it says change WALLTIME. Does this refer to JOB_WALLCLOCK_TIME? What should it be set to?
## Day 1, Monday, 20th Nov
Example output from example1:
- run directory with logs:
/cluster/work/users/tomast/noresm/N1850frc2_f19_tn14_test01_20231120
- archive directory with data:
/cluster/work/users/tomast/archive/N1850frc2_f19_tn14_test01_20231120
- [AZ] Betty and I have the same problem in opening netcdf files. We get the following message:
` [...] Note: could not open file /cluster/home/adelez/.ncviewrc for reading`
I checked with accessing through ssh -Y and -XY, also checked the modules loaded, we have ncview/2.1.8-gompi-2021a
- [AG]: are you using a mac? See discussion at the end
- [AZ]: We solved the issue. Just to let you know for future use:
- it doesn't work on VScode, because it doesn't have the "ssh -Y" visualisation on
- 'module load etc..' need to be repeated also in the userwork directory (if you want to add that into the notes)
* MK: Could you please share slides during/after the workshop? Yes, I'll post the link soon https://docs.google.com/presentation/d/13kPl_jgbhuAgJq-E6zKO_wgp04JdHEtyEu-x2hiYzbc/edit?usp=sharing
* [MD]: Can you add Adele to the nn9039k? (username: adelez)
* MD: FYI, if you want to clear your terminal use `clear`
*
* [PB]: Sorry, I am stuck in the first step. I can't find the script ./create_newcase in ~/NorESMworkshop2023/NorESM/cime. Can someone help me?
- [TT]: the full path is starting from NorESM folder: ./cime/scripts/crete_newcase
- [TT]: The first step in the instruction was to do : cd ~/NorESM/cime/scripts
- [TT]: You can also run it directtly using the full path: ~/NorESM/cime/scripts/create_newcase ...
- [PB] Got it! Thank you very much!
- [TT] Great!
* [PB] I got this error on doing ./case.build - ERROR: /cluster/home/$USER/NorESMworkshop2023/NorESM/cime/src/build_scripts/buildlib.gptl FAILED, cat /cluster/work/users/$USER/noresm/N1850frc2_f19_tn14_test01_20231120/bld/gptl.bldlog.231120-114205
* [TT]: Do you see any specific error message in the log file gptl.bldlog.231120-114205 (e.g. using less)?
* [PB]: Where can I find this log file?
* [TT]: After ERROR in the line you wrote, it is the full path to the log file.
* [PB]: Oh I see. yes. Here's the error:
* gmake: Leaving directory ```
```
/cluster/work/users/$USER/noresm/N1850frc2_f19_tn14_test01_20231120/bld/intel/openmpi/nodebug/nothreads/gptl'
Catastrophic error: could not set locale "" to allow processing of multibyte characters
compilation aborted for /cluster/home/$USER/NorESMworkshop2023/NorESM/cime/src/share/timing/gptl.c (code 4)
gmake: *** [gptl.o] Error 4
ERROR: Catastrophic error: could not set locale "" to allow processing of multibyte characters
compilation aborted for /cluster/home/$USER/NorESMworkshop2023/NorESM/cime/src/share/timing/gptl.c (code 4)
```
* MD: if you get locale error do:
```bash
export LC_ALL=en_US.UTF-8
```
* [PB]: Thanks!. Looks like, something is running.
- [PB]: Would be nice to go a bit slow. I want to understand what is going on each step.
- [PB]: Can you please explain what was the reason for setting up this variable - LC_ALL=en_US.UTF-8 ?
- [MD]: It is not set by default. If you want to not do this every time, put that line in you `.bashrc`
- [PB]: Noted. Thanks MD.
- [TT]: locale setting - LC_ALL - will set the format that is used for e.g. date and time. It's a basic system setting.
- [PB]: I see. Thankyou!
* [HC]: error in build:
```
* ERROR: Command /cluster/home/cheh/NorESM_code/NorESM/components/clm/bld/build-namelist failed rc=2
out=
err=Can't locate XML/LibXML.pm in @INC (you may need to install the XML::LibXML module) (@INC contains: /cluster/home/cheh/NorESM_code/NorESM/components/clm/bld /cluster/home/cheh/NorESM_code/NorESM/components/clm/bld /cluster/home/cheh/NorESM_code/NorESM/cime/utils/perl5lib /cluster/home/cheh/NorESM_code/NorESM/components/clm/bld /node/lib/perl5 /cluster/lib/perl5/x86_64-linux-thread-multi /cluster/lib/perl5 /cluster/projects/nn9600k/haochi/mambaforge/lib/perl5/5.32/site_perl /cluster/projects/nn9600k/haochi/mambaforge/lib/perl5/site_perl /cluster/projects/nn9600k/haochi/mambaforge/lib/perl5/5.32/vendor_perl /cluster/projects/nn9600k/haochi/mambaforge/lib/perl5/vendor_perl /cluster/projects/nn9600k/haochi/mambaforge/lib/perl5/5.32/core_perl /cluster/projects/nn9600k/haochi/mambaforge/lib/perl5/core_perl .) at /cluster/home/cheh/NorESM_code/NorESM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
BEGIN failed--compilation aborted at /cluster/home/cheh/NorESM_code/NorESM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
Compilation failed in require at /cluster/home/cheh/NorESM_code/NorESM/components/clm/bld/CLMBuildNamelist.pm line 414.
```
- HC: This turns out to be related with the conda environment. Solved After deactivate conda. But then other error comes:
```
^CERROR: /cluster/home/cheh/NorESM_code/NorESM/cime/src/build_scripts/buildlib.csm_share FAILED, cat /cluster/work/users/cheh/noresm/test2/bld/csm_share.bldlog.231120-120617
```
- MD: just `./case.build --clean-all` before building the case again.
[PB]: I loaded ncview as intructed and I have logged in using ssh -Y but getting the following error:
Note: could not open file /cluster/home/prba3626/.ncviewrc for reading
Error: Can't open display:
* [AG]: Seems like a login error. Are you sure you used -Y ? Try again?
* [PB]: Yes, I have sshed using -Y command. And I have tried again. Still getting the same error.
* [AG]: which ncview module did you load
* [PB]: /cluster/software/ncview/2.1.8-gompi-2021a/bin/ncview --> this is what I get when I do 'which ncview'
* [AG]: I don't know why you get the error, but can you try module load ncview/2.1.7-intel-2019b just to see if it is related to your ncview version
* Are you using a mac? https://stackoverflow.com/questions/72870054/note-could-not-open-file-home-ncviewrc-for-reading-error-cant-open-display
* [PB]; Yes, I am using a mac. And I tried loading ncview/2.1.7-intel-2019b. Still getting the same error.
- MD: Probably ssh did not have -Y option? `ssh -Y yourusername@betzy.sigma2.no`
- PB: No, I have ssh'ed using x11 forwarding. ssh -Y
- MD Mac should have X11 by default. Can you run `xclock` on betzy? or `echo $DISPLAY` ?
- PB: xclock --> Error: Can't open display:
- MD: Then it is for sure problem with the X11 on the client.
* [PB]: Okay! I guess I need to install Xquartz.
* [RS]: I also get the same error. Did you manage to make it work?
* [HY]: Q: What is "pe-hrs" in the timing log? Thanks!
* [MD]: basically CPU-hours
* MK: If I create a new case with a compset_name e.g., "NSSP126frc2", will the NorESM run the simulation with SSP 1-2.6 scenario automatically?
- MD: If the compset is supported then yes.
* [MM]: Error during submit: am I added to nn9039k?
Submitting job script sbatch --time 00:59:00 --account nn9039k .case.run --resubmit
ERROR: Command: 'sbatch --time 00:59:00 --account nn9039k .case.run --resubmit' failed with error 'sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified' from dir '/cluster/home/mmu072/cases/N1850frc2_f19_tn14_test01_20231120'
- you can check which projects you have access to by writing **cost** in the terminal. The command will give you a list of projects that you can use. If you don't see nn9039k, you need to let us know :)
[MM]: I do not have nn9039k in my list. Could you add me? user "mmu072".
[TT]: It will take some time. Do you have access to some other project on Betzy?
This is because we need aproval from the project owner, and he is not available in our meeting.
[MM]: Ok, no problem. Yes, I can use nn9824k instead. Thanks!