<p style="text-align: center"><b><font size=5 color=blueyellow>Basic Deep Learning Tasks from CPUs to GPUs</font></b></p> :::success **MultiGPU Artificial Intelligence Train-The-Trainer Course — Schedule**: https://docs.google.com/document/d/1ztkd5I2k40QetHLwKdnOw4d6Ub_BsrR2epV2dt0wV3E/edit?tab=t.0 ::: ## Instructors - [Ashwin Mohanan](https://enccs.se/ashwin-mohanan) (AM), ENCCS/RISE - [Elena-Anca Praschiv](https://www.linkedin.com/in/elena-anca-paraschiv-367a0218a/), (EP), ICI Bucharest - [Yonglei Wang](https://enccs.se/yonglei-wang) (YW), ENCCS/LiU ## Schedule ### <font color=red>Day 1 (March 3)</font> | Time | Contents | Instructor(s) | | :---------: | :------: | :-----------: | | 14:00-14:20 | [Introduction to Deep Learning](https://enccs.github.io/deep-learning-mini/1-introduction/) | YW | | 14:20-15:15 | [Classification by a neural network using Keras (1-6)](https://enccs.github.io/deep-learning-mini/2-keras/) | AM | | 15:25-15:45 | [Classification by a neural network using Keras (7-10)](https://enccs.github.io/deep-learning-mini/2-keras/#perform-a-prediction-classification) | YW | | | | | 15:45-16:00 | Coffee Break (15 min) | | | | | | 16:00-16:45 | [Monitor the training process (1-6)](https://enccs.github.io/deep-learning-mini/3-monitor-the-model/) | YW | | 16:45-17:30 | [Monitor the training process (7-10)](https://enccs.github.io/deep-learning-mini/3-monitor-the-model/#perform-a-prediction-classification) | AM | ### <font color=red>Day 2 (March 4)</font> | Time | Contents | Instructor(s) | | :---------: | :------: | :-----------: | | 09:30-10:20 | [Advanced layer types (1-6)](https://enccs.github.io/deep-learning-mini/4-advanced-layer-types/) | EP | | 10:20-10:50 | [Advanced layer types (7-10)](https://enccs.github.io/deep-learning-mini/4-advanced-layer-types/#perform-a-prediction-classification) | EP | | 10:50-11:00 | [Outlook and further reading](https://enccs.github.io/deep-learning-intro/6-outlook/) | AM | --- ## Useful Links - [Lesson material](https://enccs.github.io/deep-learning-mini/) - [Login to LB cluster (using temporary username)](https://hackmd.io/@yonglei/login-lb-temp-username) - [Login to LB cluster (regular users)](https://hackmd.io/@yonglei/login-lb-normal-username) - [Google docs](https://drive.google.com/drive/folders/1GqULIbJ5wJsvUk6zgu9fFDCkvfmOtjQN) --- ## ==Setup programming environment on LB cluster== :::info ### 1. Login to LB cluster Either for regular users of LB or the one using temporary usernames and passwords, please try to login to LB cluster using the correct method. Once login to LB, we are now at the path `/leonardo/home/userexternal/<YOUR-USER-ACCOUNT>`. ### 2. Create a working directory and clone files from Github Create a working directory at your login node. ``` mkdir multiGPU_AI_TTT_course_DL cd multiGPU_AI_TTT_course_DL/ ``` You can directly get files from Github repository via running the following commands: ``` git clone https://github.com/ENCCS/deep-learning-mini.git cp -fr deep-learning-mini/jupyter-notebook-for-castiel2-course/* . ``` or you can copy files from the directory for this course with the following commands: ``` cp -r /leonardo_work/tra25_castiel2/COURSE_MATERIAL/DAY1/afternoon/* . ``` After running the commands listed above, you will have the following files/folder in your working directory (here it is `multiGPU_AI_TTT_course_DL`) - 0_DL_job.sh - 1_DL_test.py - 2-Classification-NN-Keras-PenguinsClassification.ipynb - 3-Monitor-training-process-WeatherPrediction.ipynb - 4-Advanced-layer-types-ImageClassification.ipynb - penguins_dataset.csv - weather_prediction_dataset.csv - image_classification (this is a folder with four `npy` files) Submit job script `sbatch 0_DL_job.sh` to ask for computational resources and also test the programming environment. - check the status of the job via command `squeue --me` - once working, you will see the name of the compute node, like `lrdn1234` ### 3. Generating ssh keys for LB cluster Method 1: - Follow instructions [HERE]() to get access to LB compute node from local computer. Method 2: - Using 2FA: `smallstep` to generate ssh key for LB - `step ssh certificate <useremail> --provisioner cineca-hpc <mykey>` - you will be asked to create a password for `mykey` - after running this command, you will get three files - `mykey` - `mykey-cert.pub` - `mykey.pub` - copy three `.pub` files above to the login node of LB. - `scp mykey* <YOUR-USER-NAME>@login01-ext.leonardo.cineca.it:~/.ssh` - login to login node again and go to `.ssh` directory - set up privilege for `.ssh` folder and `mykey*` files - use `mykey` for authorizing login for compute nodes ``` cd .ssh cat mykey*.pub > authorized_keys chmod go-rwx authorized_keys ``` ### 4. Set up 2FA access for both login and compute node Following Step 2, if your job is allocated, you can similar results if you run `squeue --me` ``` JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) idxxxxxx boost_usr jupyter username R 0:44 1 lrdnxxxx ``` - At the login node, run the command `ssh -i ../.ssh/mykey -L 9777:localhost:8888 -N -f lrdnxxxx` - ==Open a new terminal at your computer==, and run the command to make a ssh port forwarding between your local computer and login node `ssh -L 7777:localhost:9999 -N -f <YOUR-USER-NAME>@login01-ext.leonardo.cineca.it` - Access jupyter lab/notebook through browser on your local computer `http://127.0.0.1:7777/lab?token=tokenxxxxxxxxxxxxxxxxxx`, in which the token is available at the `slurm-idxxxxxx.out` file. ::: --- :::danger You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such. ::: ## Q/A - Is this how to ask a question? - Yes, and an answer will appear like so! ### 1. Introduction ![neutron-in-DL](https://hackmd.io/_uploads/ByG8FvGo1e.png) ### Check-in Do you have Jupyter Lab up and running? *Add a `+` to cast your vote* - Yes:+++++ - No: ++ ### ==Pairplot== - Is there any class that is easily distinguishable from the others? *Add a `+` to cast your vote* - Adelie: + (easy separate adelie from gentoo) - Chinstrap: - Gentoo: ++++ - Which combination of attributes shows the best separation for all 3 class labels at once? - ... - ... - ... ### ==One-hot encoding== How many output neurons will our network have now that we one-hot encoded the target class? *Add a `+` to cast your vote* A: 1 B: 2 + C: 3 ++ --- :::info *Always ask questions at the very bottom of this document, right **above** this.* ::: ---