# HPC course notes - Day 3
## Presentation material
- Folder with presentation material: https://drive.google.com/drive/u/1/folders/10VuU7REr4xlijXzg7R7wxfzTsamXHqJw
- NIRD Toolkit : https://apps.sigma2.no
- [Asking for help with supercomputers](https://cicero.xyz/v3/remark/0.14.0/github.com/bast/help-with-supercomputers/main/talk.md/)
- [Using SSH with public-key authentication](https://git.app.uib.no/mvhulten/hpc-user-course-ssh-key-authentication/-/blob/master/crypto-users.sent)
## Notes
- What is a CPU in this context? ~500 cores on NIRD-TOS? How many nodes?
- I think we have 8 physical nodes on each site.
- 500 "cores" on one site are likely virtual cores, so there is likely some oversubscription factor. If each node had 32 physical cores, oversubscription factor would be 2.
- Ok, and with "oversubscription" you mean that the resources you request are not _reserved_ for your application? So the 256 physical cores can hold 512 single-core applications before being "full"?
- You can find more info on https://www.sigma2.no/systems#nird-service-platform
- When I registered for dataporten with feide guest account, the login credential is not connected to my user account on NIRD. So the resulting dashboard says "Your account is not allowed to register personal applications and APIs." How can I resolve this issue?
- The following is the answer from the SP team " - Concerning UiO users they can only use feide against "activated" services by UiO. Since there is no such thing like a wildcard activation and we cannot ask UiO to activate manually every service created in the NIRD toolkit so we use openfeide"
- So if you are UiO you need to use OpenId (https://openidp.feide.no/).
- We do see the usability issue here, but sorry to say that this is something to do with agreements and not implementation limitation.
- Thanks. In terms of openIdp access, is there somewhere I can submit my project information online to connect to the NIRD toolkit. Or does this have to be done by the project manager by emailing the support?
- If you could send an e-mail to sigma2@uninett.no i'll gladly assist you with this.
- Thanks, already did
- Who is the "optimal" person to make/enable/change the top docker image of a project. The project leader?
- A lot of the collegaues in my groop which are projects leaders (professors) does not know how to change a docker image.
- The PI enables the NIRD Toolkit, but anyone in the group can install the packages if the Toolkit is enabled for the project. More info here https://documentation.sigma2.no/nird_toolkit/package-install.html#appendix-a-the-meaning-of-each-of-common-fields-in-the-installation-form
- (from zoom chat) Smallest machine is "Base 1CPU 1GB", right?
- yes , you are right and please use machine type base
- **When given access to an application for group members, they will equivalently be using the group leader's account on NIRD to access data and docker images? Will they all have read and write access to the files on NIRD? How to ensure that one doesn't accidently break things unintentionally?**
- You use your own account.
- Okay, but do you gain writing access to the project manager's files?
- I don't thinks so, but sp-team will answer how this works
- I think the question is for files stored under the "project space" (similar to /cluster/projects/nn... on the HPC systems). On the HPC systems all project members have write access.
- i'm still on "fetching events". Is the initialization a bit slow for others as well?
- seems it was also slow for others ... we need to investigate what is the root cause for this
- On my application page it shows "Failing The application could not start due to: The list of events may contain more details explaining what is causing the application to fail. What's the reason? Below at Recent events | fetching events...it shows: 2m Back-off restarting failed container backoff
- problem might be too high load ... so maybe try again later
- if problem persists please let us know again
- Will the minio data download link be available to share and download after the application has been closed? Or should it only be closed once the data has been downloaded by the intended recipient? i.e. Does the data and its download link persist after closing the application?
- The download link persist until you close the application. When you delete the bucket the data won't be available there.
- I have accepted the invitation to join the group CRaaS1, ns9989k and Dataporten App Creators on Dataporten. When trying to install or look at any of the services I receive the notification: You are not allowed to use any projectspaces.
- Could you send an email, so that I can check your username or send a private chat on zoom to (Dhanya Pushpadas)
- On it!
- At this jupyter, are the python packages already installed after setting up?
- More info here https://documentation.sigma2.no/nird_toolkit/package-install.html#appendix-a-the-meaning-of-each-of-common-fields-in-the-installation-form
- Can you explain where we could check/change the dockerimages used by the jupyter notebook again?
- it is coming in the next session
- Thanks! :)
- For jupyterhub authorized groups, you refered to a class and create a new group, can you explain how to create it?
- we have an example of that on the documentation pages:
https://documentation.sigma2.no/nird_toolkit/getting_started_guide.html
- the groups are created here:
https://minside.dataporten.no/#userinfo
- What happens if someone (student) forgets to logout?
- service runs until someone stops it (admin or student)
- Services that are still running in the course namespace after the entire course is over will be stopped and deleted by us.
- I have accepted the invitation to join the group CRaaS1, ns9989k and Dataporten App Creators on Dataporten, but when trying to install jupyterhub, for the Projectspace, I only see my research project as the only choice.
- Please try logging out and back in to the Toolkit, to ensure that your group access is refreshed.
- Yes, now the craas1-ns9989k shows up
- Great! 👌
- Not clear to me: Is the point with the NIRD toolkit that you can run tools/portals/codes to analyze and visualize massive data stored on NIRD (yourself or students/others that you invite) or is this unrelated to actual NIRD storage? F.ex., just to teach someone to use Jupyter notebooks in a course?
- The Toolkit can be used for both scenarios actually. A namespace in the toolkit is directly connected to a NIRD storage project so that you have access to any files stored there, which makes it possible to run analysis on the data without having to stage it first.
- .. and we are using it right now in the course to teach you how it works. The namespace "craas1-ns9989k" that you have access to is connected to the NIRD storage project "NS9989K".
- Is it just me or others also, my jupyterhub just initializing, always shows "Started container pause".
- Looks like there is only one deployment that it stuck right now. Have you tried deleting it and deploying a new one?
- I just deleted the jupyterhub installation, and try to install minio, it also keep initializing, and shows "1m Back-off restarting failed container backoff" and "2m Ingress craas1-ns9989k/minio-1616061065-minio update". It keep initializing, so I tried reconfigure, and it shows "Persistent storage No persistent storage found in this projectspace". Can someone help, so far, none of the installation succeed.
- If you could try another deployment and leave it i'll check the logs.
- I deleted all the installation, and tried new minio install now, but same problem: "keep initializing, and 1m Back-off restarting failed container backoff".
- Looks like you are deploying with OpenIdP? Log in to the toolkit with Feide from your institute and try deploying again. Make sure that you have joined the Dataporten group with that Feide user as well.
- **What is the differece between docker image and proxy image?**
I think this question is related to rstudio, since there are two docker images you are allowed to change in the advanced configuration of rstudio. The one pointing to quay.io/uninett/rstudio-server:tag is the one you should change if you would like to add software.
The proxy image sets up an nginx proxy, you would probably never need to change this.
- tried to install (installing gave no errors) and add some libraries in R and got this error
- > library(sf)
Error: package or namespace load failed for ‘sf’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/usr/local/lib/R/site-library/units/libs/units.so':
libudunits2.so.0: cannot open shared object file: No such file or directory
- has it to do with the docker image?
---
## Asking for help & Login via ssh keys
* Thanks for these pointers, will definitelly use these approaches when interacting with my user base ;)
- Do you have any tips for using emacs locally and login to the cluster through ssh?
- Can we run eshell on the remote or is this not recommanded?
- I have seen this: https://documentation.sigma2.no/getting_started/editing_files.html, but is does not give a lot of practical tips.
- (I am not an Emacs user so I cannot answer first question) but I agree we need to improve the documentation for this. It can be really useful to use a local editor editing files on remote resource and we should show how.
- Especially if we are not supposed to change the .bashrc file!
- let me clarify (I was brief) it is OK to change .bashrc for everything that does not involve calculations. totally ok to set up an environment there for "working"/editing. only the computations should in my opinion not depend on .bashrc, both to simplify debugging but also to allow for reproducibility of calculations
- I'm looking in to find a good setup myself. Do you accept issues and pr for the documentation?
- yes! PRs very welcome towards: https://github.com/UNINETTSigma2/documentation
- Here is a good explanation for emacs and tramp set-up: https://willschenk.com/articles/2020/tramp_tricks/
- Follow up: which editor do you (instructurs) use on the clusters? vim/vi? Do you have any recommended setups you could share?
- The default configuration is not very friendly/good I think
- we use the editor we are most comfortable with (some use emacs, some vim/vi, some something else) ... everyone should do the same, particularly if you edit many files ... it shouldn't be an additional hassle in your daily work
- I use vim with the following plugins: https://github.com/bast/config/blob/main/install.sh#L11-L17 and following configuration: https://github.com/bast/config/blob/main/vimrc
- How many ssh keys should one typically have? I guess it is ok to use the same key to log in to Saga, Fram, Betzy etc? How about further connecting to github etc from these machines?
- I use one keypair per my own hardware device. So I have one on my desktop and another keypair on my laptop. And then I put these two public keys on all the services I need to get to.
- To access GitHub you could use key forwarding but there I actually prefer to either clone to my laptop and scp to cluster or to create a separate key for the cluster which can only clone but cannot "write" to GitHub. But if you want to do development on the cluster, it might be useful to create a keypair on the cluster or to key forward.
- Is there anything special about the ssh-keys on the HPC systems compared to other systems? I have ssh-key login set up for several local servers and they work. Tried setting up for Saga in the last few days and just now with the whole instructions, and I'm still being asked for the password each time. Any suggestions what could go wrong? -
- in principle there should be no difference. two possible reasons why it fails (try also ssh -v host, to get verbose debug output): wrong file permissions for your authorized_keys or maybe you use a protocol which is not supported (too old or too new). i am unsure whether Saga understands ed25519 keys for instance, i need to check. but if you use RSA keys, it should work.
- ssh -v says "host key algorithm: ecdsa-sha2-nistp256" - does that mean ed25519 and RSA are not supported?
- i wanted to write that RSA is definitely supported but i should check :-) personally i am using ed25519 but some clusters' operating systems don't support it yet
- Ed25519 is supported on any recent installation, including the clusters' login nodes (CentOS).
- Note that the *host key* is something else from the *public-key authentication key pair*. The former is just used for setting up a pre-authentication session; the latter is most important for the user (`ssh-keygen` &c.)
- A followup on the editor point above: As a more novice user to HPC I'm more used to GUI editors. Some offer the possibilty to login via SSH (e.g., VisualStudioCode), so that you can edit files on a server basically from your own machine with all the GUI 'benefits'. Is it possible to do so also on the Metacenter machines?
- I think this is possible. I haven't tried but it should be possible and we should document how.
- Cool, thanks for the reply. Maybe I'll just try and see whether it works.
- it should work because for the cluster it will look like any other ssh connection and the cluster has really no way of knowing that there is an editor on the other side. one thing that might happen is that the login node gets a bit overwhelmed if the editor "bombs" it with too many requests but i don't think this is a concern and let's solve that if that happens.
- This is also possible to do with emacs
- Have problem install any of the apps in nird toolkit. Marius helped me a bit, told me to use my UiO account instead of openIDP, I am quite confused, which one should we use exactly? None of them installed successfully so far.
- the last time i used the NTK from an UiO account (maybe a year or two ago) I think I actually used the UiO account to launch/install the app, but used the OpenIDP account to access the running service ... very confusing also for admins I think
- That approach worked for me as well today. Not super user friendly...
- i think some legal work/agreement needs to be done (databehandleravtale) to simplify this ... it's ongoing work if IIRC
- Where can we get the nird toolkit demo-videos that we can see later?
- somewhere (eg a Google drive) via links on the course page
- will let you know via email