![](https://github.com/computecanada/magic_castle/raw/assets/logo.png =x100) # [Magic Castle](https://github.com/ComputeCanada/magic_castle): Terraforming the Cloud to Teach HPC ### November 12, 2023 ## General information - This HackMD: https://hackmd.io/@MagicCastle/SC23 - Create an account for hands-on: https://mokey.sc23.magiccastle.live/ - Keep a record of your password! - lowercase usernames only - Quiz: https://quiz.magiccastle.live/ - GitHub repo: https://github.com/ComputeCanada/magic_castle - Tutorial slides : https://docs.google.com/presentation/d/e/2PACX-1vR-PMvGbWuQTJrzNCws9BEcogZyX2HlzHDQrsgGbOPRsvBbl-8iIS1o_9VAAjqv406BpZkGI2-gW8jF/pub?start=false&loop=false&delayms=30000 - Workshop feedback form: https://submissions.supercomputing.org/?page=SessionEval&new_year=sc23&id=sess216&eval_stype=stype461 ### Schedule **November 12, 13:30-17:00 MST** |Time | Topic| | --- | -----| |13:30-13:45 | Welcome and setup| |13:45-13:55 | Creating a Magic Castle Cluster in 5 minutes| |13:55-14:20 | Terraforming the Cloud to Teach HPC| |14:20-15:00 | Magic Castle| |15:00-15:30 | Break| |15:30-16:45 | Hands-on exercises| |16:45-17:00 | Break| |11:20-12:00 | Q&A| --- ### Instructors - Félix-Antoine Fortin - Alan O’Cais (he/him, CECAM/University of Barcelona, [@ocaisa](https://github.com/ocaisa)) - Lydia Vermeyden - Darren Boss --- ### Code of conduct The SC Conference is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, race, or religion. We do not tolerate harassment in any form. :::spoiler Contributor Covenant Code of Conduct During this tutorial, we strive to follow the [Contributor Covenant Code of Conduct](https://www.contributor-covenant.org/version/2/1/code_of_conduct/) to foster an inclusive and welcoming environment for everyone. [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](https://www.contributor-covenant.org/version/2/1/code_of_conduct/) In short: - Use welcoming and inclusive language - Be respectful of different viewpoints and experiences - Gracefully accept constructive criticism - Focus on what is best for the community - Show courtesy and respect towards other community members Contact details to report CoC violations can be [found here](https://sc23.supercomputing.org/attend/code-of-conduct/). ::: --- :::danger You can ask questions about the workshop content at the bottom of this page. We use the videoconferencing chat only for reporting videoconferencing problems and such. ::: --- ## Questions, answers, discussion and information - is this how to ask a question? - yes, and an answer will appear like so! - Have you tested with [OpenTofu](https://opentofu.org/)? - Slide on OpenTofu coming up - Turns out the slide is very far down the list but yes, it should work, but I don't believe we have tested it just yet - Can you do 'demand-driven' auto-scaling (possibly within a maximum number of configured nodes)? - E.g. When a job is submitted, it creates the nodes necessary for the job, then shuts the nodes down & deletes them when the job is done. - Just to check -- the on-demand nodes are only shut down after the queue is empty? (i.e. those nodes stay up while jobs are waiting for resources?) - Terraform Cloud alternatives: - https://medium.com/@elliotgraebert/four-great-alternatives-to-hashicorps-terraform-cloud-6e0a3a0a5482 - https://www.runatlantis.io/ - https://docs.gitlab.com/ee/user/infrastructure/iac/ - Not for Magic Castle but we have some other infratructure deployed with Terraform that we use Gitlab's Terraform functionality as the state store and it works well - Is the problem that Magic Castle is connected to GitHub? Or Terraform Cloud? Or am I misunderstanding the problem? - I think I see. You need something with an HTTP API. Not 'Git Ops'. - When you change the number of nodes, will it restart the Slurm daemon? In other words, will it kill existing, running Slurm jobs? - it will not. - What was the password again for Julian? - We will all be creating our own clusters after the break so we don't need the password post break ## Exercise 1 - `ssh yourusername@sc23.magiccastle.live` - `terraform version` - `source cloud-creds.sh` - `tar xvf magic_castle*.tar.gz` - `mv magic_castle-aws-13.1.0 mycluster` - `cd mycluster` - `nano main.tf` - `# (Set a unique cluster name, save, and then exit nano)` - `terraform init` - `terraform plan -out=myplan.zip` - `terraform apply myplan.zip` ## Exercise 2 - `ssh -A centos@<your-ip-address` - `tail -f /var/log/cloud-init-output.log` - `journalctl -u puppet -f` # Ctrl-C to leave journalctl - `ssh mgmt1` - `tail -f /var/log/cloud-init-output.log` - `journalctl -u puppet -f` ## Exercise 3 - `nano main.tf` - Uncomment the dns module by removing # in front of lines 63 to 75 - `source cloud-creds.sh` - `terraform init -upgrade` - `terraform plan -out=my-plan.zip` - `terraform apply my-plan.zip` - `ssh -A centos@<your_username>.magiccastle.live` ## Exercise 4 - `nano main.tf` - Add "proxy" to login1's tags array - `terraform plan -out=my-plan.zip` - `terraform apply my-plan.zip` - `nano data.yaml` - `source cloud-creds.sh` - `terraform plan -out=my-plan.zip` - `terraform apply my-plan.zip` ## Plugs & resources - Jetstream2 - https://jetstream-cloud.org - NSF-funded OpenStack cloud resources for US-based researchers - GPU & large memory nodes available -