Practical Deep Learning - Day 2

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Practical Deep Learning - Day 2

Practical Deep Learning — Schedule: https://hackmd.io/@yonglei/practical-deep-learning-schedule-2025

Schedule

Time	Contents	Instructor(s)
09:00-09:10	Welcome and recap	YW
09:10-10:00	Monitor the training process	YW
10:00-10:10	Coffee Break
10:10-11:00	Advanced Layer Types	AM
11:00-11:10	Coffee Break
11:10-11:50	Transfer learning & Outlook	AM
11:50-12:00	Wrap-up

Setup your environment

LUMI

Go to Open On-demand interface: https://www.lumi.csc.fi/pun/sys/dashboard/

Choose Jupyter

WARNING: The container had to be upgraded, so some parameters have changed. Look out for the emoji. Also, means you should leave that field blank.

Project: project_465001310
Partition: small-g

Resources

Number of CPU cores: 8
Memory (GiB): 8
Number of GPUs: 1
Time: 4:00:00

Settings

Working directory: /scratch/project_465001310
Show advanced settings:
- Custom Python type: container
- Modules to load:
- Path to container with Python:
```
/scratch/project_465001310/env-deep-learning-intro/container.sif
```
- Container arguments:
- Init script for container:
```
/scratch/project_465001310/env-deep-learning-intro/init_script.sh
```
- Enable virtual environment:
- Save settings: Give it a name, like deep-learning-intro, so that you can use it later!

Launch Jupyter, and change to your directory

The init script should have created one for you under the scratch directory at env-deep-learning-intro/workspace/$USER/deep-learning-intro/notebooks/lumi

Get the new notebooks!

Either run git pull from the terminal or use the JupyterLab git interface.

Navigate to the directory containing the notebooks
Click on the left sidebar and then on the cloud button with a down arrow, as shown below

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

How to use TensorBoard on LUMI

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.

Questions, answers and information

Is this how to ask a question?
- Yes, and an answer will appear like so!

3. Monitor the training process

I wonder if it is common to combine two optimizers, for example, use "adam" until the loss func. reaches a given threshold, and then switch to something better when one is closer to the minima.
- Optimizers are initialized when the training starts of, so it is not easy to switch in between. However, what one could do is choose the "learning rate" scheduler instead of a fixed one which is the default. Something like a cosine function is commonly used: "high learning rate" to start with and then reduce it progressively.
  - Thank you.
    - Also a side note, Adam optimizer is actually a combination of two older optimizers: RMSprop and Momementum.
Why have we chosen 200 epochs to begin with?
- This is arbitary, and to demonstrate overfitting; but as you saw at the end of the lesson, using EarlyStopping manages to end the training before 200 epochs. This is the way to go.
My question is: Do you aim always for the global minimum? Is the algorithm able to test the whole landscale or only a certain area around the current location? If we don't reach the global minimum, but only a local one, is this then not correct?
- the aim of training is to find the global minimum, but in reality it is difficult to reach the global minimum if the constructed model is too complex
- so our aim is trying to find the global minium, as much closer to the global minimum as possible
  - Would you then perfom the search x times and average the result?
    - this is not a good option. A better way is try to explore all parameters and try to get a "minimum", even this minimum is not the global minimum
  - more explanations: in theory we want the global minimum, but in practice, we rarely reach the global minimum
  - instead, we always settle for a local minimum or just some "good local minimum" points that can give good performance in the prediction
    - How do you explore this in an automatic way? That is not so clear to me, and when do you stop with your search please?
      - we always start at some points in the parameter space with optimization algorithms (GD, SGD, batch GD, etc.). following the gradient/slope of the loss function locally to descend to a local minimum nearby.
      - then we do the test the model, if the set of parameters are not good, we will fine-tune model parameters until we get a good performance (very good feedback on the test dataset)
      - there are some properties (like F1 value) to validate the evaluation.
      - we can have lot of model parameters, and we always pick up the set that gives the lowest loss function as this set of parameters is the most promising one.
Could you explain the term "baseline" in "baseline prediction"?
- baseline is the reference point for us to evaluate the trained model
- if the trained model outperforms the baseline (with small value for loss function), we can say that the trained model is acceptable
- if the trained model cannot pass the line setted by the baseline (that is why it is called baseline), the constructed model is not a good model
  - then we have two options: abandon the constructed model, or try to improve the model with lots of strategies
  - we provided three exercises at Step 9, you can explore each strategy to see how to improve the model

4. Advanced layer types

Number of features in Dollar Street 10

How many features does one image in the Dollar Street 10 dataset have?

A. 64
B. 4096
C. 12288 +++++
D. 878
Building a CNN looks like it uses stencil computations, is it?
- yes, u r absolutely right.
- building and executing a CNN does involve stencil-like computations, particularly in the convolutional layers
I've seen Google use images for their CAPCHA. Some of them have dark edges or are too saturated. Why black edges though? To confuse the model?
- dark edges (or high contrast) in CAPCHA can be deliberate design choice to confuse automated models, especially in CNN which is good at pattern recognition
- yes, black edges (or dark borders) is used to confuse the model
  - the reason is that CNN is sensitive to edges and contrast
Could you explain in more details the relation: Accuracy vs. Loss? Is it something similar to Bias vs. Variance?
- Accuracy is ~~inversely~~ indirectly related to the loss. Lower the loss, higher the accuracy. Of course there is no upper-bound for the loss. However, accuracy gives you a metric between 0 an 1. Does that make sense?
- Small correction. Accuracy is used for classification problems. See https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall

Network depth: Try it on your own!

https://enccs.github.io/deep-learning-intro/4-advanced-layer-types/#refine-the-model

5. Transfer learning & Outlook

Keras Applications

which property to choose to validate the trained model?

one is balanced accuracy
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →

Always ask questions at the very bottom of this document, right above this.

Schedule

Setup your environment

LUMI

Resources

Settings

Questions, answers and information

3. Monitor the training process

4. Advanced layer types

Number of features in Dollar Street 10

Network depth: Try it on your own!

5. Transfer learning & Outlook

Read more

Julia for High-Performance Data Analysis - Day 3

Julia for High-Performance Data Analysis - Day 2

Practical Deep Learning - Day 3

Practical Deep Learning - Day 1