Basic Deep Learning Tasks from CPUs to GPUs

Basic Deep Learning Tasks from CPUs to GPUs

MultiGPU Artificial Intelligence Train-The-Trainer Course — Schedule: https://docs.google.com/document/d/1ztkd5I2k40QetHLwKdnOw4d6Ub_BsrR2epV2dt0wV3E/edit?tab=t.0

Instructors

Ashwin Mohanan (AM), ENCCS/RISE
Elena-Anca Praschiv, (EP), ICI Bucharest
Yonglei Wang (YW), ENCCS/LiU

Schedule

Day 1 (March 3)

Time	Contents	Instructor(s)
14:00-14:20	Introduction to Deep Learning	YW
14:20-15:15	Classification by a neural network using Keras (1-6)	AM
15:25-15:45	Classification by a neural network using Keras (7-10)	YW

15:45-16:00	Coffee Break (15 min)

16:00-16:45	Monitor the training process (1-6)	YW
16:45-17:30	Monitor the training process (7-10)	AM

Day 2 (March 4)

Time	Contents	Instructor(s)
09:30-10:20	Advanced layer types (1-6)	EP
10:20-10:50	Advanced layer types (7-10)	EP
10:50-11:00	Outlook and further reading	AM

Useful Links

Setup programming environment on LB cluster

Either for regular users of LB or the one using temporary usernames and passwords, please try to login to LB cluster using the correct method.

Once login to LB, we are now at the path /leonardo/home/userexternal/<YOUR-USER-ACCOUNT>.

2. Create a working directory and clone files from Github

Create a working directory at your login node.

mkdir multiGPU_AI_TTT_course_DL
cd multiGPU_AI_TTT_course_DL/

You can directly get files from Github repository via running the following commands:

git clone https://github.com/ENCCS/deep-learning-mini.git
cp -fr deep-learning-mini/jupyter-notebook-for-castiel2-course/* .

or you can copy files from the directory for this course with the following commands:

cp -r /leonardo_work/tra25_castiel2/COURSE_MATERIAL/DAY1/afternoon/* .

After running the commands listed above, you will have the following files/folder in your working directory (here it is multiGPU_AI_TTT_course_DL)

0_DL_job.sh
1_DL_test.py
2-Classification-NN-Keras-PenguinsClassification.ipynb
3-Monitor-training-process-WeatherPrediction.ipynb
4-Advanced-layer-types-ImageClassification.ipynb
penguins_dataset.csv
weather_prediction_dataset.csv
image_classification (this is a folder with four npy files)

Submit job script sbatch 0_DL_job.sh to ask for computational resources and also test the programming environment.

check the status of the job via command squeue --me
once working, you will see the name of the compute node, like lrdn1234

3. Generating ssh keys for LB cluster

Method 1:

Follow instructions HERE to get access to LB compute node from local computer.

Method 2:

Using 2FA: smallstep to generate ssh key for LB
- step ssh certificate <useremail> --provisioner cineca-hpc <mykey>
- you will be asked to create a password for mykey
- after running this command, you will get three files
  - mykey
  - mykey-cert.pub
  - mykey.pub
copy three .pub files above to the login node of LB.
- scp mykey* <YOUR-USER-NAME>@login01-ext.leonardo.cineca.it:~/.ssh
login to login node again and go to .ssh directory
- set up privilege for .ssh folder and mykey* files
- use mykey for authorizing login for compute nodes

cd .ssh
cat mykey*.pub > authorized_keys
chmod go-rwx authorized_keys

Following Step 2, if your job is allocated, you can similar results if you run squeue --me

      JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   idxxxxxx boost_usr  jupyter username  R       0:44      1 lrdnxxxx

At the login node, run the command ssh -i ../.ssh/mykey -L 9777:localhost:8888 -N -f lrdnxxxx
Open a new terminal at your computer, and run the command to make a ssh port forwarding between your local computer and login node ssh -L 7777:localhost:9999 -N -f <YOUR-USER-NAME>@login01-ext.leonardo.cineca.it
Access jupyter lab/notebook through browser on your local computer http://127.0.0.1:7777/lab?token=tokenxxxxxxxxxxxxxxxxxx, in which the token is available at the slurm-idxxxxxx.out file.

You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.

Q/A

Is this how to ask a question?
- Yes, and an answer will appear like so!

1. Introduction

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Check-in

Do you have Jupyter Lab up and running?

Add a + to cast your vote

Yes:+++++
No: ++

Pairplot

Is there any class that is easily distinguishable from the others?

Add a + to cast your vote
- Adelie: + (easy separate adelie from gentoo)
- Chinstrap:
- Gentoo: ++++
Which combination of attributes shows the best separation for all 3 class
labels at once?
- …
- …
- …

One-hot encoding

How many output neurons will our network have now that we one-hot encoded the target class?

Add a + to cast your vote

A: 1
B: 2 +
C: 3 ++

Always ask questions at the very bottom of this document, right above this.

Instructors

Schedule

Day 1 (March 3)

Day 2 (March 4)

Useful Links

Setup programming environment on LB cluster

1. Login to LB cluster

2. Create a working directory and clone files from Github

3. Generating ssh keys for LB cluster

4. Set up 2FA access for both login and compute node

Q/A

1. Introduction

Check-in

Pairplot

One-hot encoding

Read more

Practical Machine Learning -- Event Page

Practical Introduction to GPU Programming

Practical Deep Learning - Schedule

[Webinar] Practical Introduction to GPU Programming