Try   HackMD

Basic Deep Learning Tasks from CPUs to GPUs

MultiGPU Artificial Intelligence Train-The-Trainer Course — Schedule: https://docs.google.com/document/d/1ztkd5I2k40QetHLwKdnOw4d6Ub_BsrR2epV2dt0wV3E/edit?tab=t.0

Instructors

Schedule

Day 1 (March 3)

Time Contents Instructor(s)
14:00-14:20 Introduction to Deep Learning YW
14:20-15:15 Classification by a neural network using Keras (1-6) AM
15:25-15:45 Classification by a neural network using Keras (7-10) YW
15:45-16:00 Coffee Break (15 min)
16:00-16:45 Monitor the training process (1-6) YW
16:45-17:30 Monitor the training process (7-10) AM

Day 2 (March 4)

Time Contents Instructor(s)
09:30-10:20 Advanced layer types (1-6) EP
10:20-10:50 Advanced layer types (7-10) EP
10:50-11:00 Outlook and further reading AM


Setup programming environment on LB cluster

1. Login to LB cluster

Either for regular users of LB or the one using temporary usernames and passwords, please try to login to LB cluster using the correct method.

Once login to LB, we are now at the path /leonardo/home/userexternal/<YOUR-USER-ACCOUNT>.

2. Create a working directory and clone files from Github

Create a working directory at your login node.

mkdir multiGPU_AI_TTT_course_DL
cd multiGPU_AI_TTT_course_DL/

You can directly get files from Github repository via running the following commands:

git clone https://github.com/ENCCS/deep-learning-mini.git
cp -fr deep-learning-mini/jupyter-notebook-for-castiel2-course/* .

or you can copy files from the directory for this course with the following commands:

cp -r /leonardo_work/tra25_castiel2/COURSE_MATERIAL/DAY1/afternoon/* .

After running the commands listed above, you will have the following files/folder in your working directory (here it is multiGPU_AI_TTT_course_DL)

  • 0_DL_job.sh
  • 1_DL_test.py
  • 2-Classification-NN-Keras-PenguinsClassification.ipynb
  • 3-Monitor-training-process-WeatherPrediction.ipynb
  • 4-Advanced-layer-types-ImageClassification.ipynb
  • penguins_dataset.csv
  • weather_prediction_dataset.csv
  • image_classification (this is a folder with four npy files)

Submit job script sbatch 0_DL_job.sh to ask for computational resources and also test the programming environment.

  • check the status of the job via command squeue --me
  • once working, you will see the name of the compute node, like lrdn1234

3. Generating ssh keys for LB cluster

Method 1:

  • Follow instructions HERE to get access to LB compute node from local computer.

Method 2:

  • Using 2FA: smallstep to generate ssh key for LB
    • step ssh certificate <useremail> --provisioner cineca-hpc <mykey>
    • you will be asked to create a password for mykey
    • after running this command, you will get three files
      • mykey
      • mykey-cert.pub
      • mykey.pub
  • copy three .pub files above to the login node of LB.
    • scp mykey* <YOUR-USER-NAME>@login01-ext.leonardo.cineca.it:~/.ssh
  • login to login node again and go to .ssh directory
    • set up privilege for .ssh folder and mykey* files
    • use mykey for authorizing login for compute nodes
cd .ssh
cat mykey*.pub > authorized_keys
chmod go-rwx authorized_keys

4. Set up 2FA access for both login and compute node

Following Step 2, if your job is allocated, you can similar results if you run squeue --me

      JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   idxxxxxx boost_usr  jupyter username  R       0:44      1 lrdnxxxx
  • At the login node, run the command ssh -i ../.ssh/mykey -L 9777:localhost:8888 -N -f lrdnxxxx
  • Open a new terminal at your computer, and run the command to make a ssh port forwarding between your local computer and login node ssh -L 7777:localhost:9999 -N -f <YOUR-USER-NAME>@login01-ext.leonardo.cineca.it
  • Access jupyter lab/notebook through browser on your local computer http://127.0.0.1:7777/lab?token=tokenxxxxxxxxxxxxxxxxxx, in which the token is available at the slurm-idxxxxxx.out file.

You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.

Q/A

  • Is this how to ask a question?
    • Yes, and an answer will appear like so!

1. Introduction

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Check-in

Do you have Jupyter Lab up and running?

Add a + to cast your vote

  • Yes:+++++
  • No: ++

Pairplot

  • Is there any class that is easily distinguishable from the others?

    Add a + to cast your vote

    • Adelie: + (easy separate adelie from gentoo)
    • Chinstrap:
    • Gentoo: ++++
  • Which combination of attributes shows the best separation for all 3 class
    labels at once?

One-hot encoding

How many output neurons will our network have now that we one-hot encoded the target class?

Add a + to cast your vote

​​​​A: 1
​​​​B: 2 +
​​​​C: 3 ++

Always ask questions at the very bottom of this document, right above this.