Try   HackMD

ARCHIVE Intro to High Performance Computing - Summer 2021

This document is used to move content from the live hackMD document to archive it. This document is editable only by members of ASC.


Intro to High Performance Computing

╔═══════════╗ ╔═════════════╗
║   WEB     ║ ║  TERMINAL   ║
║  BROWSER  ║ ║   WINDOW    ║
║  WINDOW   ║ ╚═════════════╝
║   WITH    ║ ╔═════════════╗
║   THE     ║ ║   BROWSER   ║
║  STREAM   ║ ║   W/HACKMD  ║
╚═══════════╝ ╚═════════════╝

07/06 - Intro to High Performance Computing

Icebreaker - What is your background and what brings you here?

  • I'm staff
  • I am a PhD student and need computing power for bioinformatics
  • I am a PhD student and need computing for classification
  • I am a PhD student also. I need computational resources/Skill.
  • Postdoc at Aalto
  • summer intern at CS department, working on deep learning, so need a lot of computational power
  • Summer intern at NuME group @Aalto
  • Master Student/ summer intern at Ambient Intelligence Group, Aalto. Hoping to learn how to use the computational resources for Machine Learning
  • Master Student at Aalto. Thesis about radiation shielding for sattelites. Need to run Monte Carlo particle simulations and need lots of CPU power.
  • Summer trainee at Aalto Energy conversion group
  • Research assistant at Aalto
  • Summer intern at NMR Research Unit, University of Oulu. Looking forward to get a more general idea on HPC related things.
  • Master's student / summer intern in the Computational Field Theory group of the University of Helsinki, working on simulations of phase transitions in the early universe.
  • Summer intern at Aalto CS working on DL. Need access to large scale computation.
  • Summer intern at Aalto Department of Applied Physics
  • Bachelor student at Aalto
  • Thesis worker
  • Summer research assistant at ELEC department @Aalto
  • PhD student at Tampere University
  • I am a summer intern at NBE department at Aalto
  • I am new Aalto ITS employee.
  • I am a doctoral student at Aalto
  • I am Visiting Researcher at Tampere University
  • I am a Postdoc at Aalto. I already used HPC clusters but is good to reharse a bit.
  • Summer intern at NMR research unit, University of Oulu. Looking forward to get a more general idea on HPC related things.
  • Summer intern at Department of Applied Physics @Aalto. Looking forward to learning the Triton environment.
  • Summer intern at Department of Computer Science.
  • Summer intern in Physics
  • PhD Student at Tampere University, looking for a refresh on Slurm and cluster usage in general
  • Summer intern at Tampere University, working with HPC cluster environment
  • Summer intern CS department, background in Cognitive Science / Sociology
  • Summer research assistant @ Aalto
  • Masters students @Helsinki working as a summer research assistant. Videos were good!
  • Summer intern at UH CS Department, CS Masters Student
  • I'm a postdoc researcher in Aalto. I'm willing to learn how to use clusters for calculations on nanophotonics.
  • Summer research assistant @ Aalto!
  • I am a PhD student at Aalto
  • Researcher at H
  • Masters student at Tampere University
  • Summer reaserch assistant @Aalto @NBE
  • 1st year PhD student in Aalto
  • Summer intern and thesis worker at Aalto
  • summer intern in Helsinki's CS department, Mathematics and CS background
  • PhD student working on Biomedical Text Mining and need computing power
  • Summer research assistant at Aalto.
  • Summer research assistant at Helsinki University
  • Summer Intern at Aalto
  • Summer intern at Aalto
  • PhD student at Tampere University
  • Summer intern at University of Helsinki, bachelor's student
  • IT Staff
  • Summer research assistant at Helsinki
  • Summer trainee at Aalto
  • Summer internship in University of Helsinki. CS and Math Student
  • Summer research assistant at NBE department in Aalto.
  • summer research asssistant in the NBE department at Aalto
  • I am a master student at Helsinki University. Currently, I am working as Research Assistant.****
  • Summer intern at Aalto
  • Summer intern at Aalto NBE department
  • Summer research assistant at Aalto CS department
  • Summer intern at QCD group at the Department of Applied Physics in Aalto
  • Summer intern at Aalto ELEC department.
  • Data Science student at HU
  • PhD student at aalto.
  • How long will these sessions take?
  • Summer intern at the department of applied physicsat Aalto.
  • summer project and I would like to use some of the resources
  • IT Staff
  • Postdoc. Will work with clusters. Postdoc. Will work with clusters.
  • Summer intern at Aalto NBE department
  • PhD student @ Helsinki. Would be glad to learn more about the HPC and how it works.

Intro

Ask anything, write always at the bottom (please include your organization to the question as there can be differences between Aalto/Helsinki/Tampere/Oulu university clusters.)

  • this is a question
  • and here another one
    • a reply
      • a comment to the reply
  • (Oulu) We are supposed to connect to Zoom with our name being "(University) First Last" but my display name is locked by administration and can not be changed. Is this a problem? Is there an obvious work-around?
    • enrico: Please get in touch with instructor: Pekka S. from Oulu. He will also be in the zoom room with us.
  • Hello world! We don't need Zoom now?
    • simo: Yes and no. You can join us in the zoom meeting for help with the exercises / having a discussion with your group members. You can use this HackMD for questions. You can also just watch the stream.
  • Are you going to provide guest Triton account for the exercises?
    • Have you registered to the course? Please request a Triton account as mentioned in the prerequisites
      • Everyone Aalto participant who registered before midnight today got a Triton account. Others will get it later today.
  • (Aalto) Can I get credits from taking the course?
  • Tampere University: who do I contact to fix an issue with my Narvi account? (ssh key login works, but got "password expired" error)
    • Check the instructions at TCSC
  • (Helsinki) Hi, If I miss some parts today but I want to watch it anyway later, can I get this livestream somewhere to watch?
    • The stream will be recorded and will be published in our youtube channel
    • Before that, it is stored on Twitch for 14 days. See course page for the link.
  • Where can the lecture slides be found?
  • Simppa & Ivan: your mics are bit loud
    • Is it better now?
    • Yes :)
    • I had to fix it on the stream side, it's a bit interesting to try to keep stuff in balance.
  • If twitch is not behaving well for you, try reloading the stream https://www.twitch.tv/coderefinery You can also use the controllers (bottom right) to reduce the stream quality.

(moved to bottom)

HPC crash course

Slides: https://users.aalto.fi/degtyai1/SCiP2021_kick.HPC_crash_course.2021-06-04.pdf

  • if you saw some chaos in video artificts, it was the host resizing the zoom screenshare.
  • Small clarification: In University of Helsinki we are also aiming to have only one cluster, turso. Current plan is that kale is the last separate entity, and in the near future all other "separate" clusters will be part of turso.
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
  • What is the policy of work on clusters? Is it sort of booking for each of the user, or it is the automatic queueing of the tasks?
    • The "workload manager" (Slurm in our case) handles this. Ivan will probably mention it soon now
    • This is the major topic of days 2-3
  • Does sinteractive open a different node rather than login node?
    • Yes. sinteractive makes a reservation to the queue and gives you resources from a compute node for interactive usage. We'll talk about interactive jobs on day 2.
  • Note: In university of Helsinki, sinteractive is not in use. We use "interactive" command instead.
  • What's the difference between the term "accelerator" and "GPU"?
    • simo Accelerator refers to a larger class of hardware that can be used to accelerate computations. Nowadays GPUs are the most common forms of accelerators, but for example Google uses its own TPU's (Tensor processing units) to accelerate machine learning training. GPUs or GPGPU (general purpose GPUs) are usually optimized for vector / matrix calculations. Some other accelerators (such as ASIC cards or FPGA cards) can do e.g. crypto mining or other specialzied computations even faster than GPUs.
  • Enrico, you are much quieter than the others, I can't hear you well. Better now, thanks!
    • Thanks for reminding me!
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
  • Is it possible to convert a serial job to a parallel job? Also, does rstudio support GPU computing?
    • simo It depends. If the framework you're using supports it, the change can be trivial, but if it doesn't, it can be very hard. R has multiple packages for parallel execution such as future and furrr. There are many GPU packages as well. It really depends on your problem.
  • Is it possible to use scikit learn on many CPUs? For example, I want to perform GridSearchCV.
    • simo Most of the classes classes / functions / solvers in scikit-learn have parameters that define how many CPUs you want to use. GridSearchCV has this kind of a parameter as well. We'll talk on day 3 on how you can reserve multiple CPUs for your jobs. Combining this with the parameter for the GridSearchCV will make parallelization quite simple.
  • What is the difference between cluster and a node in terms of computing, eg is it better/faster/more efficient to use cluster or a designated node only for me?
    • mikko Cluster is multiple nodes (=servers) joined and linked together. For a computing task it really depends on a task what is the best place and way to run it. Own computer (laptop or desktop) is totally fine if you are able to do your task in a reasonable amount of time. But if you need more resources then the computing cluster might be the tool you need. And to clarify, you can also claim a dedicated node within the cluster for your computing task. We'll return to these details in the coming days.
    • simo Node is a single computer / server, whereas cluster consists of multiple nodes, storage, network connecting the nodes, login node etc.
  • You mentioned shell scripting but a lot of the commands you used in the videos about basics for the Shell used GNU coreutils such as rm and cp which are .c files. What's the difference? Are these still shell scripts?
    • These commands can be scripted in a sense that if you write them into a txt file, you can run that file and these commands will be processed one after the other. BASH (one of the scripting languages for shell) allows other thinks like if, while, for loops, etc
  • Will we have exercises on how to make (convert) serial jobs to parallel jobs using linux?
    • On day 3, yes. Though it assumes your code already can do parallel, we talk about how to get the computer todo it.
    • simo Different programs / programming languages have different parallelization strategies. The tools depend heavily on your program. What program are you using? Requesting resources for these parallel jobs is similar to all languages. We'll go through this process on day 3.

CSC resources

Slides: https://kannu.csc.fi/s/3K8q93XSwtSgHEa

  • Please remember to put your hackmd to view mode after you have done edits!

Short poll

Please answer by writing o after the question

  • I have used CSC services
    • Yes: oooo
    • No: o
    • Dont know:
  • I have used some cluster in FGCI
    • Yes:
    • No:ooo
  • I am interested in:
    • technical details of CSC supercomputers:
    • CSC data management services: oo
    • How to submit a batch job: oo

Questions

  • I have not used CSC or cluster but just a node. I want to learn how to use cluster and what is the difference.
    • Today was a high-level overview, it will become more clear once we start doing it tomorrow.
    • A cluster consists of a set of nodes; a cluster is the sum of its nodes.
    • If you mean how to use multiple nodes simultaneously, you will need to create or use a program that can utilize them. You need to tell the program how to split the work and how to combine the results.
      • On day 3 we'll go through how you can request multiple CPUs / multiple nodes. We'll also describe commonly used parallelization strategies.
  • I still do not get what CSC is.
    • The government office producing supercomputing and networking resources for academic use
    • It provides a lot of computing, data, and computer services to the academic sector in finland (Free for researchers!)
    • Owned by the Finnish Government, so is "free" to us
    • It took a long time for me to understand it, too!
  • Can you reveal us what CSC stands for? Wikipedia didn't satisfy us
    • It used to be centre for scientific computing, but you are not supposed to use that meaning. It is just CSC now, afaik.
    • I'll ask Jussi, I am interested to hear, too.
    • The tagline is/used to be "CSC the IT Center for Science" which describes it a bit better.
  • For Aaltoers: ePouta is coming soon to Aalto. I (enrico) am testing it. If your supervisor has these "sensitive data storage" needs, get in touch!
  • As an Aalto student am I only supposed to use Triton, or should I also try out e.g. Puhti?
    • I'll bring up the question
  • The concept of "node" is still unclear for me. And also "Bandwidth"
    • The node is a physical machine part of the cluster. It has multiple CPUs and GPUs and access to shared storage. The picture here might give a better idea, the nodes are computer visualized: https://scicomp.aalto.fi/triton/usage/workflows/
    • Bandwidth is a measure of the speed of a telecommunication channel. Large bandwidth -> lots of data can be transferred fast. It is called bandwidth because in transmission you use a "frequency band" (like FM radio) and the width of the band is proportional to the amount of information you can transfer.
  • Is the CPU frequency limited? Is there a reason why it is 2.1 GHz and not more?
    • The frequency is most likely the maximum All-core frequency of the specific processor chip. So maximum frequency it can work under 100% utilization.
    • Choosing the best hardware for a new cluster/supercomputer is a multi-goal optimization. I'm sure CSC chose the best processor in the big picture.
    • High clock frequency means higher power consumption and higher running temperature, i.e. more need for cooling. Thus, the choice is mostly dictated by energy efficiency.
  • Is the triton a part (or a server) of Puhti or Mahti?
    • Triton is not part of those. From the user point of view, they are completeley independent (same idea, small differences)
      • Triton is maintained by Aalto Scientific computing, Puhti and Mahti are maintained by CSC. All are separate clusters. CSC has more resources, local clusters can usually give more user support to local researchers.
    • Here a visualization:
    • A visualization of triton cluster
  • If I want to do a specific computation but have no idea what the best hardware for it is, who could I talk to to figure out where to submit my job?
    • At Aalto, come to our "daily garage" and chat with us.
    • Users of Helsinki university may also join daily garage events for any questions.
    • How about Tampere university?
      • You could contact either via Help-desk or send email to TCSC email address which you can find e.g. in TCSC EduuniWiki
  • So a cluster is a set of computers, and a node is a computer?
    • Basically, yes. Making a single large "computer" is expensive and doesn't provide much benefit compared to the cluster strategy.
    • simo Node is a single computer / server, whereas cluster consists of multiple nodes, storage, network connecting the nodes, login node etc.
  • Will these questions be backed up?
    • Yes, we save them and share them with the youtube video.
  • Sorry, what does pre-exascale mean? In regards to Lumi.
    • "exascale" referring to the capability to perform a billion billion operations per second. So
      1018
      .
  • Error trying to access puhti with my csc password: "Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)". Any clues on the why? I'm trying to access it via VSCode, following their instructions here: https://docs.csc.fi/support/tutorials/remote-dev/
    • If you have many ssh keys, you might want to use ssh -o IdentitiesOnly=yes ... when making the connection. Otherwise ssh will try out all of your ssh keys for authentication which might result in too many connection failures.
      • Other option is to use ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no ..
      • If the access fails when you type your password, you might want to check from my.csc.fi if you have accepted the user agreement there. CSC servicedesk can also help.
    • at least in cPouta you get that error if the virtual machine firewall has not been configured, you need to add security settings to enable connection from your IP address and SSH port 22. Don't know if that is the case in puhti
  • Is CSC services a good solution for long-term data storage (30+ years for instance)?
    • For digital perservation, see https://www.fairdata.fi/en/fairdata-pas/ The main question to ask ourselves is about the value of such data, is it worth to be used again in 30 years? Then with this in mind, one has to prepare it so that software from the future can load it and use it. If you are at Aalto, our data agents network is happy to help you with this researchdata@aalto.fi
  • What is the link to the CSC docs?
  • How to create disk mounting for Windows in CSC? I'd be happy to have a samba solution (such as Triton has) for Puhti/Mahti is it possible?
    • I'm simo not completely certain what would be the best course of action on CSC machines, but I would most likely either try to work with their remote work tools (e.g. nomachine) so that data transfer would not be necessary or would use some sshfs client for Windows. CSC customer desk might have a better answer. Samba is usually not used unless you can be in the same network as the system (with VPN, for example).
    • mikko CSC people can answer better. But I'd assume this would not really work that well. There is a long physical distance most likely with your machine and CSC (data at Kajaani, some 500km from Helsinki area). Thus using something like samba share would not be feasible. For Aalto users I do recommend on visiting our daily garage if you are planning large scale data transfer (or even small scale) to/back from CSC.
    • JussiE As Mikko said, the physical distance alone introduces quite a bit of latency. Some sshfs based solution might work "OKish".

Git crash course

There is no link to this material, we will copy stuff here as we discuss it.
Please ask questions in this HackMD.

Quick poll on git usage

Please answer by writing o after the question

  • I have used git before:
    • Yes:ooooooooooooooooooooooooooooooo
    • No:oooooooooooooooooooooooooooooooooooooooooooooooo
    • Yes but no idea what I am doing: oooooöoooooooooo

Questions:

  • Is this git basics or something more advanced? I haven't had lunch so I wonder if the beginning of this can be skipped as I have quite good routine with git already
    • This will be a very basic introduction. It is highly recommended to listen if you have never used git as it is a very valuable tool.
      • I switched to the scientific computing workflows video, another chance with this digital format is kind of pick-and-choose between the basic parts
      • Yes that is a good point, ideally things can be broken into micro bits and one picks what they need if they want to.
  • Who created git and why?
    • Linus Torvalds to manage linux development I think, correct me if I'm wrong.
      • Yes.
    • Other alternatives (which pre-date git) are SVN and CVS. Not sure why they are all 3 letter words :)
      • The other version control systems have less "colorful" naming history than git
  • Probably 'sudo apt install git-all'
  • For installation instructions: https://coderefinery.github.io/installation/shell-and-git/ for all major OS
  • Can I use Cubbli from HY for git training?
  • git init - used to make a new repository
  • git clone https://github.com/AaltoSciComp/hpc-examples.git
  • I have git version 2.19.2. Should I update it?
    • Anything > 2 is effectively good enough. I wouldn't bother, at least not unless you want some brand new repos
  • could you copy here the link to the repository you're cloning?
  • What's the difference between fork and branch?
    • Branch is a split in history of the files/commits in the repository. Fork is a full copy of the repository (branches included). In GitHub you can fork publicly available repositories if you want to edit the project source code yourself.
  • What's the difference between fork and clone?
    • clone: make a local copy of a repository to somewhere else.
    • fork: usually when we say "fork" it means Github/Gitlab relates the repositories. In some sense, it is a general divergence.
  • Learn git by yourself with videos and materials from CodeRefinery: https://handsonscicomp.readthedocs.io/en/latest/C/ (module C21)
  • .are we supposed to follow and do the same things? this is a bit fast
    • This is more of a demo to give you an idea. You can re-watch it at some point if it is useful, or check link above for more videos.
    • Yeah, we also run CodeRefinery courses, which take 3 half-days to go over git intro. This is a fast intro. -> Youtube playlist here
    • simo It is better to follow and try to get a grasp of what git tries to do.
  • fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. I got that error when cloning the repository.
    • Remember to use HTTPS cloning if you do not have SSH with GitHub enabled.
    • I had the same problem using HTTPS, when put the specific directory to clone into. When I removed this specific indication, it worked.
      • If you run git clone https://github.com/AaltoSciComp/hpc-examples.git, git will try to clone the repository to the folder where you're running the program in. If you run git clone https://github.com/AaltoSciComp/hpc-examples.git some_folder, it will try to clone the repository to some_folder.
  • I am getting the following error : "git: 'restore' is not a git command. See 'git help'.The most similar command is remote"
    • simo Your git version might be too old. I see the same error. Try using git checkout -- file_name instead.
  • I am getting this error git: 'restore' is not a git command. See 'git help'
    • Same
    • Same here
      • simo Your git version might be too old. I see the same error. Try using git checkout -- file_name instead.
  • $ git checkout README.md error: unknown option `README.md'
    • simo Run git checkout -- README.md. Spaces around -- are important. Otherwise git will think that you want to give it an option --README.md.
    • space between -- and filename
  • ok and it does the same thing as restore?
    • simo Yes. git restore is a new command with a bit more descriptive name than git checkout. git checkout is also used to switch between branches.
  • Do I need to have a github account to use this?
    • No if you just clone locally. But if you wanted to push the changes to the central repository, then yes.
      • But you don't have to use Github as a central repo
        • Exactly. Github is just one of the options. In theory you just need any SSH server that can be your "remote" repository.
    • The course also doesn't require it
    • You can also use version.aalto.fi as a central source for your repositories.
  • where is all the past process stored? I missed it :_D
    • Past changes are stored by git in commits in the .git-folder in the repository folder.
  • After git log I am stuck. How do I escape /exit from here. Seems like I cant do anything here.
    • does q exit?
      • Yes, thanks.
      • git log will most likely open a "pager" such as less that you can use to scroll through various commits.
  • git push: Repository does not exist?
    • git push will most likely not work, as you do not have access rights to the original repository. It also needs to be run from the repository folder.
  • I re-cloned/pulled git, but this changes did not appear in README.md. Why it is so?
    • If you clone from "upstream", in this case from github, you will get the version that is available there. Unless you push the changes to central repository changes are not visible. However, you can clone Jarno's repo and it should now show changes that Jarno pushed.
  • What is the optimal moment to git push vs. only git commit?
    • Depends. You'll want to push if you want to share your changes with other developers / some other machine you're using or if you just want to make certain that changes will not be lost if something happens to your local copy.
    • Making changes/undoing/whatever is easy untill you push.
      • You can revert changes after pushes, but the reverts will stay in git history.
  • Common question: So if I have not pushed for a long time, can I push all the changes (in many files) at once or should I remember where I have made changes and add one by one. (PS. my files are very big)
    • Ideally you would have staged multiple committs so that one could see the progressive changes, but sometimes we forget to do that especially if we are in a "one-person-project". Good habit is to commit and push often. :)
    • When you're pushing, you're pushing commits. Each commit can have changes in multiple files. You can push multiple commits at the same time. You can use git status and git diff to see where changes have happened.
  • Any etiquette regarding shared repos with git?
    • Many I think, personally enrico I like when a project is inclusive and encourages changes and contributions and new issues from others. I like that if I share a repository with a scientific paper, the repository is "live" and can continue to improve after the paper.
    • simo Having a shared git repository is IMO very good for your code quality. Having to discuss with other people how to manage a shared repository will make you code better and, most importantly, document your code better. I'd say have a discussion within your group on common etiquette.
  • If you will work with code in your future career, git is one of the most useful things that can be taken with you.
  • Differences between github, gitlab, other git applications? Also gists?
    • Different hosts have different features (Actions, Pull requests, etc) and different user limits or user ToS. If you're working with sensitive data, public git hosts should not be used. Gists are great for sharing quick scripts that you have made.
    • Remember that Aalto has our own hosted gitlab platform: version.aalto.fi
  • Licensing with git.
    • You can add a license file into your repository and publish it on any of the mentioned services (github, gitlab, version.aalto.fi, ) The file is usually called LICENSE or LICENSE.txt.
  • https://coderefinery.org
  • I guess there's no reason not to use git as a pretty general backup tool? For e.g. paper or thesis writing. Or should I rather use something else?
    • simo I use it for everything that is in text formats. My configuration files (e.g. .bashrc), my thesis / papers, codes, small scripts, documentation You can have multiple repos for multiple purposes. I have hundred or so. Some in version.aalto.fi, some in github.
    • Definitely! You can't afford to lose it, git is simplest way and gives you power of seeing changes
    • Git works well with text, but not so well with more complicated data formats, like pdf or word documents.
  • If you like to see one example of git in use, you can check our documentation repository in GitHub: https://github.com/AaltoSciComp/scicomp-docs . Our documentation in scicomp.aalto.fi is updated from this repository after each commit.

Break

We resume at xx:03


Your future career in scientific computing

Link: https://scicomp.aalto.fi/training/scip/summer-kickstart/future/

Questions

Connecting to Triton

  • University of Helsinki participants can now come to the kickstart zoom room 1, if any problems with ssh etc.
  • If somebody from Tampere has problems connecting narvi, I'll be in Zoom breakoutroom 2
  • I've connected to Triton/Kale/etc what do I do now to check that everything is working fine for tomorrow?
    • You'll see shortly, but if hostname shows the right computer, then you know it works.
  • Can I use my personal computer (a mac in this case) for connecting (just to make sure it's ok for security etc.)
    • Yes. You'll have to connect through a jump machine first. In aalto, kosh.aalto.fi works.
    • Of course, if you have for example, ssh keys, you'll want to secure them with a passphrase on your own computer.
  • Should I use PuTTY?
    • On windows, putty is a good option.
    • It's one of the standard options, it works well and is recommended.
  • what is the difference between kosh and triton?
    • kosh is a so-called shell server that you can use for ssh connections. It is maintained by Aalto IT services. For security reasons, triton.aalto.fi is not available to public internet. kosh is. To get access to triton, you'll need to be in Aalto's internal network. So either via VPN or through a jump server such as kosh.
      • actually I connect to Aalto VPN already and still need to use PuTTy and kosh to access triton. But Thanks for your clarification.
  • Oulu: Will there be any breakout room for University of Oulu?
  • how to exit triton?
    • exit works. On linux, <CTRL> + D will work as well.
  • How long it takes to connect to triton? It's been 10 minutes and I cannot enter my password!
    • Not long. If it takes more than a minute, there is a problem.
  • I get "permission denied" when I try to connect to Triton with my credentials.
    • Are you in the Aalto network (use VPN or connect through a different machine)
      • Yes, I connected to aalto network with vpn. I am not a student in aalto though, but a summer intern. I don't know if this has an effect
      • Can you try id username in kosh.aalto.fi and check that you are in the triton-users group.
      • No, it seems that I'm not
      • You don't have access to triton yet. If you signed up for the course today, it will take a while longer. Otherwise let us know your username in Zoom.
      • Okay, thanks!

Feedback

Please tell us one good thing about the lesson today:

  • good format, timing and length
  • this format is really good, and having everything archived for later viewing is very convenient
  • hackmd is very useful to ask and get answers
  • The workshop was very informative for beginners like us.
  • very informative
  • Well-explained, thanks.
  • Very good feedback and support in all of the platforms.
  • I really appreciated that we had a break every hour!
  • Organised well and informative
  • Good run through of useful things to know/be aware of
  • Good explanation of all the basics. Very helpful tips. Nice to have many people helping in the zoom

Please tell us one thing to improve for next time. For example, what else would you like to see/what is not necessary. Or anything. Did the format work?:

  • the git demo can be more focused for learning maybe?
  • When we have to try out commands like the git crash course session, could it be made a little bit slower pace so that we can follow what is being shown on the screen and while we try to follow it on our computers?
  • The begining with all the info on HPC is made the day a bit top heavy. Maybe interspersing the git section between the two HPC talks might split it better
  • The git instruction was pretty fast, but I guess it is understandable since time is limited
  • Would be nice to have less windows opened at the time. It is sort of messy to have all of this on the screen.
  • The CSC part could perhaps be a bit shorter, not as relevant if not a finnish researcher, everything else was perfect.
  • The HPC crash course part was a bit too heavy and overwhelming for a layperson
  • The slides could have been bigger on the twitch stream.
  • P.S many of the issues that are being had with Zoom may be due to running linux zoom which is terrible
  • Could it be possible for some Discord implementation for discussion or breakout rooms? Also voice rooms for support may be useful.
  • The twitch stream being portrait I don't think matters because we (the viewers) can always rescale it. If the stream was in landscape some of the images might be better for viewing.

Thanks to all for attending! See you tomorrow.


08/06 - Day 2

Icebreaker

which software applications/languages do you work with? What would you like to use on the HPC cluster?

  • Comsol, Matlab
  • Mostly Python o
  • Matlab, Python, brain imaging tools (FSL, freesurfer, fmriprep), R
  • Python, Jupyter
  • Python, Matlab, VMD
  • Python, deep learning libraries (PyTorch, Tensorflow)
  • Python, R
  • Python, Matlab, is blender rendering possible?
  • Matlab, Python (and PyTorch)
  • Python, machine learning libraries
  • Jupyter / Python
  • Python, Matlab
  • Python, Mathematica, Matlab, Fortran
  • Python
  • CASINO (QMC), Quantum Espresso (DFT)
  • Matlab, R
  • Comsol
  • Python, C++
  • Python, Matlab
  • Custom C++ Code and some Python
  • JuliaJulia
  • Julia, Python, some Matlab
  • Geant4
  • Python
  • Python, Jupyter
  • Python, brain imaging tools (SPM, FreeSurfer)
  • Matlab
  • Fortran, C and Python
  • Python (spyder), rstudio, fortran
  • R
  • Python, C++, Rust, TensorFlow, TensorFlow Quantum, Numba, ROCm, Geant4, Pythia8, ROOT, CMSSW
  • R, Python, bioinformatics related unix programs
  • Matlab, Python/Jupyter
  • Python, C++
  • Python ML libraries and MLops with mlflow/airflow
  • Python, Matlab, R
  • R
  • C, Python, Matlab
  • Matlab, CST Microwave Studio
  • Python, Scala, Matlab
  • Comsol, Python
  • matlab, Scala, java
  • Python
  • Python
  • Mostly Python, maybe some Matlab
  • Python
  • R, Python (very little experience)
  • Python, Matlab, Fortran
  • Python
  • Matlab
  • Python, Matlab
  • python (Spyder), MATLAB, FSL, FSLEyes
    Please continue feedback from day 1:
  • quite easy but interesting
  • pytorch

ask anything, write at the end

  • question

    • reply
      • comment on the reply
  • another question

  • "Where do I get the instructions that help to get connected to the Helsinki Cluster?"

  • Sound is very good now IMO

  • Any tips how to still listen to the stream when in zoom? When I open zoom, I cannot hear the stream

    • No stream on zoom, zoom is to chat with your colleagues while you wathc the stream or fix issues (e.g. accoutn not working)
      • I know, but the actual stream on different window somehow gets silent
        • In some previous workshop some groups picked a breakout room and one of them shared the stream there, like watching TV together.
  • I'm connected to triton, when running $WRKDIR I get 'zsh: permission denied: /scratch/work/username'.. will this be a problem? I can still access my folder I created before

    • you should type echo $WRKDIR. The variable $WRKDIR is a string so cannot be run as a command
      • got it, thanks.
  • Is it a big issue if I am not able to connect to Aalto VPN?

  • Is it all going to be about Triton, or also about Kale?

    • Everything should work on both. You can ask about differences in the zoom breakout room for Uni of Helsinki
  • Is it better to use VPN or kosh?

    • It's a matter of taste. If you enable vpn, you only need to run one ssh command
  • Are the Zooms up already and if they are where can I get in?

  • What should I do if I want to keep on using cluster (helsinki) in future?

Have you connected to your cluster already?

yes: öoooooooooooooooooooooooopöooooooooooooooooooooooooooooooooooåo
no: ooooo
have questions: oo
forgot how to: o
maybe: o

Is there anybody from University of Helsinki with connection problems?

  • I am connected to melkki but I am not sure if I am suppposed to connect to this

    • melkki is not a cluster, you should connect to kale
    • Kale didn't allow me I have tried it, it says timedout
    • I launced the ssh agent before connecting with eval $(ssh-agent) and ssh-add path-to-private-key after connecting to vpn
    • can anyone help me in the breakout room to get connected
  • Is internet needed to use ssh? I dont have internet access with my work laptop at home

    • yes you need an internet connection because it is a remote system
  • Fine for me everything. I used turso, tho

  • Are you using vpn?

  • yes to kale, not to ukko2

  • Turso and Kale work for me

I'll be in Zoom room 1 for Helsinki people at 1230, so please come in case of ssh problems.

The recommended jump hosts for Helsinki users are melkki.cs.helsinki.fi, melkinpaasi.cs.helsinki.fi, pangolin.it.helsinki.fi, markka.it.helsinki.fi, login.physics.helsinki.fi

  • What is jump host?

    • Jump host is a shell server used to penetrate the firewall. The clusters are not visible outside of University fireewalls, so you have to get inside first. Either with VPN or by using a jump host.
  • We will have a connection exercise break aroudn 12:30

  • Is there an echo for anyone else?

  • yes, someone is not muted on zoom who should be

    • I have muted all
  • The command "chsh -s /bin/bash" gives me: changing user attribute failed: Permission denied, what could cause this?

  • Does connecting via kosh while on a VPN on my work PC cause confusions/problems?

    • No confusion
  • Please, comment the possibility of connecting to Triton via VSCode with the SSH extension. It makes life way easier for Windows users. More instructions here: https://code.visualstudio.com/docs/remote/ssh.

  • use cygwin for easy linux on windows?

  • Is it possible to connect to Jupyter at the narvi (Tampere) cluster?

    • not as easily as in Aalto, but I know that people have used it via ssh-tunneling.
  • I can connect to Triton fine after I connected to VDI. But I can not connect e.g. to kosh.aalto.fi directly from my laptop (it says "permission denied")

    • I would use the VPN just in case. But you can do exercises also from VDI
      • Okay. I just have the feeling that VDI is "a bit slow"
    • I got the same issue as you, but I managed to connect using the -l (username) argument
      • yes if your local username is not equal to remote username, then you need to run ssh REMOTEUSERNAME@kosh.aalto.fi or with the -l option (REMOTEUSERNAME is your Aalto username in this case)
      • It worked, thank you :-)
  • I am concerned about loosing connectivity because the internet at my home is not very good. Would it be ok to use screen?

    • screen is good. tmux also. VDI also keeps the session up for 24h.
      • Would it be possible to have a link to some basic but good tmux config file? I have used it a long time ago and rememeber setting it up nicely was a bit of a chore, especially to use with vim.
    • Aalto: I recommend using screen on kosh (or taltta if you are Aalto) as they get rebooted less frequently than triton login node
    • screen -r screen_name
  • Is it possible to setup SSH keep-alives for University of Helsinki?

    • our servers do not have keepalive configuration, but you might try it on the client side. But mosh is recommended, if the connection is unstable. The only jump host answering to mosh is melkinpaasi.cs.helsinki.fi.
  • How to exit the new screen I created?

    • You can just log out. Or if you want to detach, you can use <CTRL>+a followed by d. So first <CTRL>+a, then "d".
      • On tmux, the command is <CTRL>+b followed by d"
  • Is Ubuntu for Windows 10 a good choice of terminal? Or should I get PuTTY?
    Personnally I prefer MobaXterm, Client for SSH : https://mobaxterm.mobatek.net/

  • If I loose connection when I am inside the screen, not detached, can I still attach to that screen later when I log in again?

    • yes. I use screen -x for "attach or re-attach"
  • you mentioned some exercises, but where do I find them? So far thought the only exercise is to connect are there more?

  • I'm connected to triton, but I can not use nvidia-smi, and torch.cuda.is_available() gives me false, how can I use GPU to do computing?

    • Mikko We will be returning to this when talking about GPU-computing. But short answer is that we do not have GPU-cards on the login node. Thus, you cannot run cuda code there (nor should you run any heavy code whatsover on login3!). But when connected to GPU-node you will have these available. If you wish to jump ahead then you can of course check scicomp.aalto.fi how to run GPU-code :)
    • The login node is just an entrance to the cluster:
      A visualization of triton cluster
  • What is screen?

    • Screen is terminal that stays behind (background). Normally when you close your laptop your remote connection (ssh) gets closed. And next time you continue you need to reconnect. Screen is a way to leave your working terminal remotelly there. So after reconnecting you can resume where you left. This is a bit advanced use case and not necessary at all. But in some cases you might get better working setup for yourself using tmux or screen.
  • For Github acc settings, should we do this on our computer or on cluster we connected to?

    • If you want the remote computer to be able to pull/push to github with your account, then you need to do the setting up on the remote computer.
  • My current shell is zsh. Do I need to change it to bash?

    • We recommend switching to bash, you can do it later -> https://scicomp.aalto.fi/triton/tut/connecting/#change-your-shell-to-bash-aalto
      • what is the command to change shell? what is the difference between 2 shells (zsh and bash)?
      • there are many flavours of Linux shells. Bash is (maybe) the most common. https://www.howtogeek.com/68563/htg-explains-what-are-the-differences-between-linux-shells/
      • Quite often if you want to take examples from our documentation or e.g. stack overflow you are better off with bash. As that is the most common the examples and tutorial are easier to "copy-paste". But all works with different shells also. If you know what zsh is and wish to use it, then go ahead. If you do not know what zsh is, then it's better to switch to bash.
      • actually I had some problems with anaconda module load using bash shell, so my instructor told me to change it to zsh.
        • This goes now in very detailed tuning. But if you supervisor has strong preference here and your supervisor has things working with zsh then maybe just go with that then. You should not get hung up on this bash vs zsh vs something else.
        • okay I'm just not sure if there's big difference between bash and zsh. but now it's clarified.
  • it says "lupa evätty" when writing $WRKDIR

    • On which machine? $WRKDIR is a string, so you would need to do echo $WRKDIR
  • I tried to change shell to bash but it didn't work, it just said authentication failure

    • On which machine?
  • Can you please mute yourself when typing but not talking?

    • simo I'll try.
  • Off-topic question: is there a place where you can ask for advice regarding coding/software related questions not related to the Summer Kickstart? (Uni of Helsinki)

Have you connected already?
yes: ooooooooooooooooooooooooåoooo
no: oo
problems:
maybe: oo

​​​​- If you type the command `hostname` you can see if you are connected to the right remote computer.

Applications

https://scicomp.aalto.fi/triton/tut/applications/

  • How to check applications in helsinki?
  • Can/Should you render Blender scenes on Triton?
    • I know people do, but I dont' know how myself. Often, these things are possible but you need to adapt the software to run headless (without a display available).
  • Great feedback on platforms to interact publically with other cluster users and cluster admins. What are the available channels in TUNI?
    • You could reach out to administrators via email tcsc.tau @ tuni.fi, or ask at it-helpdesk. In more general issues you could also use users mailing-list narvi-users.
      • So we don't have something like Zulip/MatterMost or similar for direct interaction among users at Tampere?
        • Not currently, it could propably be arranged, if it would seem necessary.
  • Not able to hear in twitch
    • I forgot to hide us, there isn't supposed to be anything
  • when we go to GPU computing, could you also explain how to use multiple GPUs, like distributed data parallel in pytorch, cuz training a large deep learning model usually need a big batch and a lot of memory, so one card is probably not enough.
    • simo We'll talk about how you can do a multi-gpu reservation, but most likely we won't have time to go into details of a specific framework. However, this kind of a problem is the kind of a problem we'd want to solve in garage and then create a documentation based on it.
  • is there an "Application page" equivalent link for Tampere?
    • unfortunately no, you can refer to Aaltos list ;-) most of those are also in narvi.
  • Will it say something when the shell is changed?
    • It won't affect currently running shells. Next time you connect, you can run echo $SHELL to see if the shell has changed. The visual layout of the shell might also change.
      • so i just have to wait 15 minutes and hope its done?
        • Basically, yes. After 15 minutes or so, reconnect and see if the change has happened.
          • do i have to start over if it hasn't?
  • How to find what kind of modules are available?
    • We'll talk about this a bit later on, but you can try module spider or module avail to see available modules.
  • After module load anaconda. I can see that tf is available with pip list. I dont know how to use tf via console / how to import tf
    • If you're unfamiliar with Python and won't be using it, the exercise is not really that important. However, if you're interested, you can run something like python -c "import tensorflow; print(tensorflow)"
  • I have been completely unable to do the exercises: python on Triton says that there is no package "tensorflow" What do you mean with "make it work enough"???
    • Try running module load anaconda. If you're not going to be working with Python, this execise is not that important. What we mean with "make it work enough" is to get no errors when you run python and import tensorflow. We'll change the ambiguity later on. The point of the exercise it to check the documentation. If the documentation is not up to date or the commands do not work, let us know.
    • Thank you for the reply. There was not enough time and the question was unclear for beginners.
  • Mikko General notice from support questions. When thinking about how to run a graphical software (e.g. rstudio, spyder, matlab) from Triton the best way may be "not-to". Quite often Triton is used for heavy non-interactive calculations (again we'll return to this). You can use laptop/vdi/desktop for editing your code and ssh simply to submit your large calculation. For many Aalto departments the data directories are visible from VDI or can be remotely made availble (see scicomp.aalto.fi for details). That would be much easier workflow than trying to run graphical things over ssh.
  • How to check if a software is available in triton? Are software and module the same?
    • Modules are a handy way of making the software available. So, if you search for modules and find the right one for you (e.g. Matlab/2021a) then just load that and go. In addition you can install/compile your own software. This would be common when developing code.
    • We'll discuss modules in detail later on. Basically modules will make the software available when you load them.
      • does it mean if I want to check if software is available, I just need to check if corresponding module is loaded? can different modules load the same software?
  • I cant access the twitch stream anymore from the email link; it says 503 service unavailable
    • AWS/Amazon/Twitch currently experiencing some problems.
    • simo There seems to be a big outtake in many internet services. Maybe AWS has problems.

Exercise and then break until xx:00 https://scicomp.aalto.fi/triton/tut/applications/#exercises

Hint 1: Switch this hackMD to view mode and click the link instead of copying it, instead of editing.
Hint 2: write at the bottom, easier to check new content.

Note: we will do modules tomorrow, and the "Asking for help with supercomputers" right after the break. This should help us get to the interesting stuff today.

Asking for help with supercomputers

Slides: https://cicero.xyz/v3/remark/0.14.0/github.com/bast/help-with-supercomputers/main/talk.md/#1

twitch works:
yes:ooo0ooooooooooOooooooooooooooooooooooooooo
no:oooooooo

  • getting error 503

  • Error 503 Service Unavailable

  • It worked but I made the mistake of refreshing the page, then it gives the error.

  • Let's hope it comes back soon. Anyway, we will have videos.

  • Dont refresh the stream! Will not work afterwards
    -I clicked on the twitch link from my email and still getting error 503

  • rkdarst I get reports that internet at my apartment also has problems for new connections. I wonder if there is some massive finland-wide thing?

  • Can you confirm which slide we are on?

    • 19/19
  • Should I run my code on the login node?

    • It is usually better to use srun to run it on a compute node or to take an interactive session with sinteractive. We'll talk about these next. Login node is shared by all users and thus even one user can hinder lots of users. I simo try to only run stuff that takes less than my attention span of watchin it (maybe a minute) and that takes only one CPU on the login node.
    • Ok, so far I did not know how to use the cluster and for a few months just used the login node to run my things just like I would run them on my laptop. I guess most other people who have also never heard about "srun" or "sinteractive" do the same. It is strange that it is so easy to run things on the login node without ever noticing that I am doing something wrong. There should be an automatic warning when you use to much resources on the login node.
  • On the topic of platforms for interaction between users<->staff and among users. I get Aalto is currently more organized on this than other universities. Would it be possible to have a Zulip/Mattermost/Matrix/whatever interactive channel common to scicomp users across Finnish/Nordic universities? And maybe get the staff from different universities on board to use that platform also for users<->staff interaction?

    • I would like this as UH doesn't have something like this.
    • coderefinery.zulipchat.com is something like this, but not exactly. It has users and admins accross the Nordics.
      • This looks like a great platform from us outside Aalto. It wasn't entirely clear so far that coderefinery.zulipchat.com was open for outsiders!
        • Just to clarify then, CodeRefinery is a NeIC project and not connected to Aalto.
  • When talking about the XY problem it might help to use the terminology of "terminal goals" and "instrumental goals". Usually you wantot achieve a specific terminal goal, but to get there you need to achieve several instrumental goal. And to help you support needs to know your terminal goals.

  • How to Ask Questions the Smart Way: http://www.catb.org/~esr/faqs/smart-questions.html

  • (Helsinki) In my experience with IT support for SciComp remember to include which VPN you are using and what your IP address is using something like ipchicken.com

  • How many nodes are too much nodes?

    • Depends on your problem. Usually, you start and scale up until the speed up becomes less than is worth it. We will talk about this more tomorrow.
  • Did I understand correctly? more nodes = increased speed.

    • Yes, if the software can handle it. More on this tomorrow.
  • Some tasks requires to mention how many threads to use. Do number of threads, number of cores have a relation?

    • Normally you want one thread per core for optimal performance of CPU-bound software. More on this tomorrow.

I am watching right now:

  • yes: ooooooooooooooooooo
  • no: o
    (if not, join zoom and it is shared there)

Zoom link: https://aalto.zoom.us/j/69608324491

Fastly status: https://status.fastly.com/incidents/vpk0ssybt3bj

  • The Aalto garages are great!! I totally recommend other Uni's to develop something similar

    • Thanks!
  • Do we also have permession to install software on the VDI?

    • No, but you can try requesting it. The module system works there (in some cases). Usually they can install popular tools. More niche requests are better for Triton, and VDI can be used as the graphical interface to triton.
    • Radovan sorry if I gave wrong info here. I extrapolated a bit from machines I know, which is always limited.
  • How can I see and track memory usage of my script, if e.g. running it first (with one subject) on Spyder with python? I can get time, but memory is always just a guess

    • If you do not need "live tracking", you can run it as a "serial job" (more on this later) and then check the job efficiency at the end (more on this later)
  • Can you turn Simo up please?

    • its good now!
  • So was the twitch session recorded now that it's down?

    • it's recorded locally.
    • my local software that is broadcasting it is recording it locally
    • twitch works for me, it's probably an internet problem
    • The ultimate problem is one Content Distribution Network is down, so you can't start up new sessions on certain webpages.
  • AWS is down, Google Cloud works.

  • Stack overflow is down -> it will take a week to solve this internet issue :D :D

    • haha. yes.
    • indeed
    • we're doomed
  • Follow up on thread question. If i am not working on cluster, but I have a designated node. How do i define the number of cores or threads to increase the speed? (I am not sure if this question belongs to this course)

    • this depends on how the code has been written/parallelized. is this your code or do you have access to source code or documentation which discusses how it is parallelized? if at all? some programs are not able to use more than one task/thread/core.
    • but answering differently: some programs use all available cores (and this can be limited by setting environment variables such as OMP_NUM_THREADS), some use only 1.
  • Are these internet issues really global? I am currently in Switzerland and here it works fine :-)

  • Coderefinery stream in twitch seems to work again: https://www.twitch.tv/coderefinery

    • Yes from this side of Helsinki!
  • Lost the audio on the stream

  • -now it works, forgot I muted the tab

    • It works now, maybe a refresh will help? The person who was sharing on zoom stopped sharing.
    • remember to unmute twitch

Data storage

https://scicomp.aalto.fi/triton/tut/storage/

  • If I run a program on the cluster, is my output authomatically stored on the cluster?

    • If it saves to files, yes. It depends on if and how the program works, but you would usually configure it to save.
  • About caching: e.g. nilearn wants to use caching for default – is that a good idea on triton? Or should I rather turn that off and request more memory for my scripts?

    • It depends on the size of the data. If you have huge files (e.g. fMRI with loads of time points) more memory makes a difference. In general it is difficult to feel the difference (or at least I did not feel it when testing).
      • I know what you mean, I think trial and eror with some files and then run it for everyone.
        • Yeah I was thinking in terms of I/O - I don't know what happens with the cache under the hood, whether it creates bad I/O patterns
        • It uses joblib.Memory, i.e. it uses the disk as a memory. It can be OK most of the time in my experience.
  • What should I do if I don't know if my job is going to cause much I/O load?

    • First make an estimate about amount of data it uses and reads in: both bytes and number of files (back of envelope calculation). Dropping by garage with these calculations is a good step
      • Come to garage and we'll take a look togehter!
        • Ok, thanks
  • How can I easily check how much quota/memory I have left?

    • the quota command
  • What is the best way to copy files to a cluster directory? E.g Simo's copy_work.sh

    • this is upcoming at bottom of this page
  • On Aalto managed computers, there's a synced Desktop and Documents folder. Which directory is this exactly when accessing from Triton?

  • for me quota on Kale does not work, says permission denied.

    • same here
    • run lfs quota -hu $USER /wrk
      • thanks!
        - I get 'not on a mounted Lustre filesystem' message with this command. What does this mean?
      • Which host is this? If you do 'lfs df', do you get any output? Usually this error means that the target directory is not lustre FS (eg. points to other than /wrk).
  • If I have a coding project as a summer project, should I put my codes and results to home or work directory or neither of these?

    • Home is for small personal files, use work (or even better a shared project folder with your team members so they can access your code+results after you have left)
    • +1 for group directories, if it's in a personal location it will die when you leave. is that good? depends on project
      • Well I work the project alone. So I think, I can use work directory and when the project is done I can put it also to a shared folder with the team.
        • Image Not Showing Possible Reasons
          • The image file may be corrupted
          • The server hosting the image is unavailable
          • The image path is incorrect
          • The image format is not supported
          Learn More →
  • Is it a good idea to clone a Github repository inside Triton and use it as a group floder?

    • cloning git repositories is good
    • do you want to share the same cloned repo among multiple people, or each person clones their own working copy?
      • Multiple persons inside the same group, working for the same project
        • Having multiple users working on the same git repository will most likely result in permission problems as you're doing commits etc.. Better idea would be to have a shared git repository on version.aalto.fi or github that everybody would use as a centralized repo. Then everybody would take their own clones of that centralized repo to their own place in the project folder / user folder.
  • git (and even git-lfs) does not necessarily play well with binary files and/or large amount of text files collecting highly volatile datapoints. Are there versioning systems you can recommend to do versioning of this kind of large-data files?

    • rkdarst I like git-annex but it's complex to use. In general these data management things are a complex thing, and solutions are dependent on the project itself
  • are you answering this at the moment? Can I mount a network drive on Kale?

  • which services do you recommend to store big datasets and once in a third-party location, how to access it and download it to triton efficiently?

    • you can have a folder on triton's scratch with a copy of the data, or if the data is updated often (e.g. NETCDF type of things) you can sync it when needed . It depends on many factors of course (what type of data, license of the data, data use agreement) so come to garage and let's talk :)
      • Thanks!
  • scp si evilly insecure, prefer sftp, rsync or rsync-over-ssh to it, always!

    • how is it insecure? I've heard this too but not details, it uses ssh so
    • that is new to me that it would be insecure.
    • and rsync would work over ssh, right?
      • one can connect with an rsync client to a dedicated rsync daemon process (e.g., rsync rsync://some.host.com/path/to/file.txt ....), but in most cases it happens over an SSH connection (technically any remote shell, but usually this is SSH, i.e. when you run rsync [user@]kosh.aalto.fi:/pat/to/some/file.txt ./). In the latter case, upon connecting to the remote endpoint, the remote shell will execute an ephemeral rsync process on the server side, which lives for the duration of the rsync transfer. After running this ephemeral daemon, the rsync client does its operations talking with the remote rsync process just spawned.
      • rsync is better and also has the advantage of being "smart": does not copy what is identical between the source and destination
  • How to leave the text file I created ?

    • what editor do
    • I have just created the text file like Simo did, but then it brought me to this file, I texted something, but when I press "enter" it goes to the next line, so I don't know how to submit the text fileand exit back
    • What command did you use to open the file? Because different edditors require differenct commands to close.
  • Does it work the other way around?

    • from remote to local? Yes: rsync remotefile localfolder
  • I wish the documentation would have a plusses/minuses table for the data transfer methods. I need instructions for Dummies to be able to make decisions on what's best for me! :D

    • good idea!
  • How is mounting different from copying?

    • The data stays remote and you just load what you need dynamically
  • does sftp://triton.aalto.fi/scratch/work/USERNAME also work for Kale or how do I do this there?

  • Are the SMB remote mounting instructions for Windows correct? It says "In Windows 10 → “This PC” → right click → “Add Network Location”." but I think "Map Network Drive" is correct here, based on the context and the fact that's how I managed to mount it

    • You are right, map network drive is correct
    • Actually, both work just fine. It's just a matter on convenience. "Add network location" gives you a quick link to (usually) menu on the left hand side. "map network drive" addtionally assings letter to the share. But both will give you the same end result.
      • Okay! I think the "In Windows 10 → “This PC” → right click → “Add Network Location”. (Note that this is different from right-click “Add network location” which just makes a folder link and has had some problems in the past.)" part could be reworded though, it's rather confusing atm
  • I got "The folder you entered does not appear to be valid" on Windows 10 using "Add network location" with smb://data.triton.aalto.fi/scratch/. What could be the issue?

    • Slashes should go the other way around with windows: \data.triton.aalto.fi\scratch\
      • Oh yeah, I was copying the wrong link. But the result was still the same: smb:\data.triton.aalto.fi\scratch\
        • without "smb:" :) "replace smb:// with \" from the page
          • Yeah I missed that completely
        • Note-to-self. We need to clarify this a bit more on the documentation. But yes, windows wants "\<servername><sharename>" format.
    • +1, having the TL;DR table would help (different OSs and different options)
    • I can't sign in to data.triton.aalto.fi using my Aalto account though
      • If you like, raise a hand at zoom. Easier to have a look over there.
      • Make sure you are on VPN
      • And when Windows ask fro the account, the format is "AALTO<account>". May not always be necessary (is using Aalto laptop) but should work always if given the full name with capital AALTO.
  • I tried to use sshfs and it says -bash: sshfs: command not found

    • I guess it's not installed, on linux you can install through package manager. other OSs, harder.
      • You can try some other methods as well for copying stuff. E.g. scp, rsync.
  • How to set up ssh config file so that it connects through multiple hosts?

  • After setting up the network folder in windows (smb://data.triton.aalto.fi/work/$username/) and creating some files there using windows operating system is there a delay until you see them in triton when connecting with PuTTY ?

Exercise until xx:10

https://scicomp.aalto.fi/triton/tut/storage/#exercises

  • Try #2 only, don't spend too long on others. Don't try anything about "filesystem performance"

I managed to:
copy something: ooooooooooooooooooooooooooo
remote mount a dir: oooooooooooooooo
neither copy/mount: oooooooooo
learn something: oooooooooooooooooooo

  • -> For University of Helsinki, in Zoom breakout room 1 will be fast intro to our storage.

    • If HY people need more time, write here.
    • A few minutes more, please
    • we are done now.
  • (Aalto) when I try to do on kosh using sshfs, it says "-bash: sshfs: command not found". I've created a directory on my kosh home.

Interactive jobs

https://scicomp.aalto.fi/triton/tut/interactive/

  • (Helsinki) If we have the same system of queuing? If so, do we have any estimations of work completion?
    • Work completion depends on what your analysis does. There are some ways to estimate waiting time in the queue, I can find the link. slurm j <jobID> on Triton /Aalto. Not sure for Helsinki Uni cluster.
  • Have I been allocated computing time just by logging in? Should I log off of the cluster if I'm not doing anything?
    • Loggin in to the login node does not allocate a job in the queue.
  • (Helsinki) Any tips on how to estimate the resources required if you have not run your code before in the cluster machine?
  • what if my program takes longer than the requested time?
    • It gets killed. You need to re-submit it with longer time estimate.
  • (Helsinki) When I run srun I get the following error: "Do not use srun to start INTERACTIVE sessions. Please start your interactive session with 'interactive' command.". Any advice what to do there?
  • Are srun and grun same thing?
    • I'm not sure what grun is, but there can be other wrapper commands and there are also other queuing systems.
  • How can we know how much memory we need?
    • If you have no idea, then start with some educated quess. If it is way off you can check that later and adjust accordinly for the next one. If you asked too little your job will fail (gets killed) with "ran out of memory" error. Also, you do not want to ask too much memory (e.g. >2x). That will make job spend more time in the queue to wait for un-needed resources.
  • Once I had a task (in R), required 200 Gb of RAM for data analysis, I had to use swap to overcome that. What would be the way to complete the same analysis on the cluster?
    • You can request 200GB of RAM (up to 3TB of RAM if I recall correctly) if you need that. Just remember that the more you request, the more you might have to wait in the queue.
  • (Kale) Error: slurmstepd: error: execve(): python3: No such file or directory
    • try python only? Or else you need to load a module, more on this later.
    • Fixed! module load anaconda check python by typing python or python3
  • I did not know about this slurm system and have so far just been running my code on Triton withouth the srun command
    • i.e. you were running things in the login node. Sometimes people forget that and the login node gets overcrowded. It is always good to put srun first or submit your job as a non-interactive one (tomorrow on this).
  • so all the packages that we install when being in the home directory will also be available when executing srun?
    • yes the home directory comes with your process to any node in the cluster (as well as the /scratch/ or /work/ folders)
  • Error:
srun -p interactive --time=2:00:00 --mem=600M --pty bash
srun: error: invalid partition specified: interactive
srun: error: Unable to allocate resources: Invalid partition name specified
  • How to fix this?
    • On other sites you might not want to use the -p interactive.
    • How to find a list of partitions on my cluster?
  • If my task is completed, do I still have this memory allocation or not? I.e. if I run srun for 1hr and 100M of RAM, will this reservation dissapear or be valid precisely 1 hr?
    • If your task is completed before the time you requested, all resources are released (i.e. you can request for more than what you need, but as usual if you request for too much you might end up waiting longer in the queue)
  • In turso there is no 'interactive' partition, please use 'test' instead.
    • Working example for turso:
/usr/bin/srun python -c 'import os; print("hi from", os.uname().nodename)'
  • On Turso, intearctive session can be started with interactive command, for example: interactive cpu 1 1
  • If both triton and my PC runs a program in the same 1 hour, what's the point of using triton to run my program then? I was thinking the cluster should speed up the runtime somehow
    • This should be discussed in the stream
    • One point about sensitive data: sometimes you should not take data outside Aalto network, then you need to run it remotely.
    • If you need to run multiple copies, you can run them at once on Triton
  • what if the job goes longer than the reserved time?
    • It will be killed and you need to re-submit it with longer time requested.

Exercise until xx:35

"Your first interactive job"
https://scicomp.aalto.fi/triton/tut/interactive/#your-first-interactive-job

  • (Helsinki) There's not interactive -command on Kale, but it is available on Turso. What is the difference between Kale and Turso?
    • For Kale, you can use: srun -p test --time=00:10:00 --mem=100M --pty bash
  • (Helsinki) I used srun after starting the interactive session, but nothing happens. What am I doing wrong?
    • Yes. You use interactive to start an interactive session on a worker node. If you just want to run a command without interactive session, you should use srun.
  • i tried to run the example srun line thing and now it says >quote and i cannot do really anything
    • CTRL+ C. Maybe you forgot the ending apostrophe?
      • that was it, thanks
  • (Helsinki) I ran interactive gpu 1 1 which gave:
salloc: Pending job allocation 136665691
salloc: job 136665691 queued and waiting for resources
salloc: error: Lookup failed: Unknown host
salloc: job 136665691 has been allocated resources
salloc: Granted job allocation 136665691
salloc: Waiting for resource configuration
salloc: Nodes ukko2-g02 are ready for job

Then I ran: srun python3 -c 'import os; print("hi from", os.uname().nodename)', it is still running (~5 mins already)
Then it gave [Asrun: Job 136665691 step creation temporarily disabled, retrying. It is still running

  • Running without srun in front of my commands makes the code run much faster even when not accounting for the wait time. Why is this?

    • It runs in the node where you are, i.e. the login node (no need to connect to another node etc). Which is ok for something short but if everyone did that the login node would be unusable
    • But why is the login node so much faster than the compute node? I just ran a little particle simulation that takes 1 minute without srun, but if I add srun I can see that it runs much slower. My simulation reports the progress after each set of 1 Million particles and on the login node 1M particles take less than a second but with "srun mem=25G" the same 1M particles take about 10s. This is less than 1/10 of the speed on the login node.
      • Because you have no limits on RAM or CPUs when running things on login node.
      • Is there any way to get the same speed as on the login node without overloading the login node? Becasue that punishes the people who use the system correctly and awards the people who just run their code on the login node.
  • (Helsinki) Could you also update the srun instruction for turso? It does not work. I also tried interactive gpu 1 1 and then the srun command. It keeps running forever.

  • (Helsinki) I ran /usr/bin/srun --mem=50M python hpc-examples/slurm/memory-hog.py 1000M but it worked, no errors. Why is that?

  • It also worked fine on kale with 5000M setting

  • It says first trying to hog 5000000000 bytes of memeory then it works

    • Job most likely finished before memory killer was engaged. Like said, there's some leeaway given to the memory limits. Try higher amount of memory for the memory-hog.
    • Actually, there seems to be a configuration error in slurm in turso. We have to fix that ASAP Currently our slurm seems to NOT kill the out-of-memory jobs!
      • could it be polling every 60 seconds before killing (like Triton used to before we switched to cgroups?)
    • I opened a Jira bug for the issue for our team.
    • This is likely a bug in cgroups configuration.
  • Where should I pip install python modules? On the login node?

    • yes these are light things and can be run on login node. Check if there's too many people using it however.
    • $HOME is so small, that you probably want to pip install under $PROJ instead of $HOME.
  • Could you run docker/podman on the cluster workers?

    • Docker is not an option for shared systems. Singularity can be used instead. You can run docker images straight from dockerhub
      • Singularity? is there a link with more info?
    • It is similar to docker to run containers https://scicomp.aalto.fi/triton/usage/singularity/
  • What was the point of the --pty -switch?

    • pty gives a pseudo terminal as in pty bash
    • It makes it slightly more interactive, you see output right away
  • if I book a certain amount of memory and use less, is it still blocked for other users?

    • Yep, it princible it was reserved only for your job.

Feedback of the day

I would prefer the course more like:
structured lecture: ooooo
exercises: ooooooooooooooo
demos: oooooooooooooooooo
like it is now: oooooooooooooooooo

This course format is:
is better than a lecture: oooooooooooooooooooo0ooo
needs more organization: oooooooooo
needs more hands-on examples: ooooooo0ooooooooooooooooooooo
should be more lecture-like: o
OK as it is: oooooooooo

One good thing:

  • Very nice examples and hands-on experience, as well as feedback and help.
  • the part on getting help was great
  • Your discussion style "lectures", so much easier to follow for 4h than one person speaking only!
  • The talks are really good and the demos are good but might be worth having a text window with notes on screen.
  • Very nice interactive sessions and the instructors are very helpful.
  • Much better than a lecture format since you can do something else if we are going through stuff like git, etc that people might be familiar with already.
  • HackMD is great!
  • HackMD was super effective (Good for introverts :D )
    • +1
  • Instruction on writing the email for help, the XY example was great.
  • I am not tired after loooooong hours of lectures. Your lectures are interesting and effective.
  • I like the questions and answers approach during the presentations, where one of you guys asks the simple questions for us.
    • +1
  • I really like the discussion format of the lectures it makes it much easier to follow along and so much more engaging. HackMd is really good too, it's really useful to have a record of the questions so we can refer back to them at a later time.

One thing to be improved:

  • Multiple communication channels hard to follow simultaniously
    • +1 +1
  • Incompatibilities of different Uni's platforms makes very hard to understand the ongoing material. In addition to learning of the new things, you have to troubleshoot and look for the solutions, due to different setups. Would like to have more universal/separate commands suitable for all the clusters. In the end of the day, we have to listen to the "lecture", and then go into breakout rooms and ask how to do the same on our cluster.
  • The docs online are great but would it be possible to include examples/commands for Kale/Turso? For example the interactive shell for Kale/Turso is different to Aalto
  • too much focus on aalto-specifics, for very univerity-specific parts maybe the course could divide to zoom breakout rooms for the entire time the part is explained. Now UH people had to spend a bit too much time with things not affecting them anyhow.
  • Would be great to have "course material" with commands for all 3 universities clusters next to each other so everybody could easily follow on stream screen.
    • +1
  • As a complete beginner and a Helsinki student it's hard to learn the commands and language because I can't know what is Aalto specific and what works in Helsinki. Non-Aalto clusters need more focus.
  • It is sometimes unclear if you're supposed to type along or not. I would also like more detailed explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what explanations on what
  • a list of useful scripts would be useful (especially for tomorrow)!
    • +1+1+1+1
  • it would be useful if you guys can be a bit more specific on what to know in advance for the hands-on demos. Many question were actually answered in the material sent in the invitation email, but probably people didn't know which parts were important to check out.
  • Is someone really expecting that we would remember all these quick example typings.. then refer to long manuals where it takes easily long time to find what you are looking for, exactly (there are so many options.. ). Would be good to be able to do at least something after this course. (Well, some basic things I did learn)
    • write them down so you don't have to remember instantly
      • need to write really fast ;-)
        -terminal has a history comman so if you're following the simple examples you can see the commands later
    • You'll prob find yourself referring a lot to the scicomp documentation there's no way to remember most of this with this one introductory course (https://scicomp.aalto.fi/triton/#tutorials)! There are lots of examples there, though I do feel there could always be more.
  • Was a bit difficult to follow the multiple information channels while trying to figure out what to do in time before continuing
    • +1
  • some course material examples have unnecessarily complex extra commands, could just have the simplest command as example (and then later example for extra commands)
    • +1+1
  • I would like to be able to type-along with the demo's, and to have some document or notes which has the same commands, and is a tutorial-style to follow along with that follows exactly what the demo is doing (and would be great to have the equivalents to the other universities too so everyone can type along)
    • +1

^^^ Please write above this line ^^^