
# [Magic Castle](https://github.com/ComputeCanada/magic_castle): Terraforming the Cloud to Teach HPC
### November 12, 2023
## General information
- This HackMD: https://hackmd.io/@MagicCastle/SC23
- Create an account for hands-on: https://mokey.sc23.magiccastle.live/
- Keep a record of your password!
- lowercase usernames only
- Quiz: https://quiz.magiccastle.live/
- GitHub repo: https://github.com/ComputeCanada/magic_castle
- Tutorial slides : https://docs.google.com/presentation/d/e/2PACX-1vR-PMvGbWuQTJrzNCws9BEcogZyX2HlzHDQrsgGbOPRsvBbl-8iIS1o_9VAAjqv406BpZkGI2-gW8jF/pub?start=false&loop=false&delayms=30000
- Workshop feedback form: https://submissions.supercomputing.org/?page=SessionEval&new_year=sc23&id=sess216&eval_stype=stype461
### Schedule
**November 12, 13:30-17:00 MST**
|Time | Topic|
| --- | -----|
|13:30-13:45 | Welcome and setup|
|13:45-13:55 | Creating a Magic Castle Cluster in 5 minutes|
|13:55-14:20 | Terraforming the Cloud to Teach HPC|
|14:20-15:00 | Magic Castle|
|15:00-15:30 | Break|
|15:30-16:45 | Hands-on exercises|
|16:45-17:00 | Break|
|11:20-12:00 | Q&A|
---
### Instructors
- Félix-Antoine Fortin
- Alan O’Cais (he/him, CECAM/University of Barcelona, [@ocaisa](https://github.com/ocaisa))
- Lydia Vermeyden
- Darren Boss
---
### Code of conduct
The SC Conference is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, race, or religion. We do not tolerate harassment in any form.
:::spoiler Contributor Covenant Code of Conduct
During this tutorial, we strive to follow the [Contributor Covenant Code of Conduct](https://www.contributor-covenant.org/version/2/1/code_of_conduct/)
to foster an inclusive and welcoming environment for everyone.
[](https://www.contributor-covenant.org/version/2/1/code_of_conduct/)
In short:
- Use welcoming and inclusive language
- Be respectful of different viewpoints and experiences
- Gracefully accept constructive criticism
- Focus on what is best for the community
- Show courtesy and respect towards other community members
Contact details to report CoC violations can be [found here](https://sc23.supercomputing.org/attend/code-of-conduct/).
:::
---
:::danger
You can ask questions about the workshop content at the bottom of this page. We use the videoconferencing chat only for reporting videoconferencing problems and such.
:::
---
## Questions, answers, discussion and information
- is this how to ask a question?
- yes, and an answer will appear like so!
- Have you tested with [OpenTofu](https://opentofu.org/)?
- Slide on OpenTofu coming up
- Turns out the slide is very far down the list but yes, it should work, but I don't believe we have tested it just yet
- Can you do 'demand-driven' auto-scaling (possibly within a maximum number of configured nodes)?
- E.g. When a job is submitted, it creates the nodes necessary for the job, then shuts the nodes down & deletes them when the job is done.
- Just to check -- the on-demand nodes are only shut down after the queue is empty? (i.e. those nodes stay up while jobs are waiting for resources?)
- Terraform Cloud alternatives:
- https://medium.com/@elliotgraebert/four-great-alternatives-to-hashicorps-terraform-cloud-6e0a3a0a5482
- https://www.runatlantis.io/
- https://docs.gitlab.com/ee/user/infrastructure/iac/
- Not for Magic Castle but we have some other infratructure deployed with Terraform that we use Gitlab's Terraform functionality as the state store and it works well
- Is the problem that Magic Castle is connected to GitHub? Or Terraform Cloud? Or am I misunderstanding the problem?
- I think I see. You need something with an HTTP API. Not 'Git Ops'.
- When you change the number of nodes, will it restart the Slurm daemon? In other words, will it kill existing, running Slurm jobs?
- it will not.
- What was the password again for Julian?
- We will all be creating our own clusters after the break so we don't need the password post break
## Exercise 1
- `ssh yourusername@sc23.magiccastle.live`
- `terraform version`
- `source cloud-creds.sh`
- `tar xvf magic_castle*.tar.gz`
- `mv magic_castle-aws-13.1.0 mycluster`
- `cd mycluster`
- `nano main.tf`
- `# (Set a unique cluster name, save, and then exit nano)`
- `terraform init`
- `terraform plan -out=myplan.zip`
- `terraform apply myplan.zip`
## Exercise 2
- `ssh -A centos@<your-ip-address`
- `tail -f /var/log/cloud-init-output.log`
- `journalctl -u puppet -f` # Ctrl-C to leave journalctl
- `ssh mgmt1`
- `tail -f /var/log/cloud-init-output.log`
- `journalctl -u puppet -f`
## Exercise 3
- `nano main.tf`
- Uncomment the dns module by removing # in front of lines 63 to 75
- `source cloud-creds.sh`
- `terraform init -upgrade`
- `terraform plan -out=my-plan.zip`
- `terraform apply my-plan.zip`
- `ssh -A centos@<your_username>.magiccastle.live`
## Exercise 4
- `nano main.tf`
- Add "proxy" to login1's tags array
- `terraform plan -out=my-plan.zip`
- `terraform apply my-plan.zip`
- `nano data.yaml`
- `source cloud-creds.sh`
- `terraform plan -out=my-plan.zip`
- `terraform apply my-plan.zip`
## Plugs & resources
- Jetstream2
- https://jetstream-cloud.org
- NSF-funded OpenStack cloud resources for US-based researchers
- GPU & large memory nodes available
-