
<p style="text-align: center"><b><font size=5 color=blueyellow>Practical Deep Learning - Day 2</font></b></p>
:::success
**Practical Deep Learning — Schedule**: https://hackmd.io/@yonglei/practical-deep-learning-schedule-2025
:::
## Schedule
| Time | Contents | Instructor(s) |
| :---------: | :------: | :-----------: |
| 09:00-09:10 | Welcome and recap | YW |
| 09:10-10:00 | Monitor the training process | YW |
| 10:00-10:10 | Coffee Break | |
| 10:10-11:00 | Advanced Layer Types | AM |
| 11:00-11:10 | Coffee Break | |
| 11:10-11:50 | Transfer learning & Outlook | AM |
| 11:50-12:00 | Wrap-up | |
---
## Setup your environment
### LUMI
Go to Open On-demand interface: <https://www.lumi.csc.fi/pun/sys/dashboard/>
==Choose Jupyter==
:::danger
WARNING: The container had to be upgraded, so some parameters have changed. Look out for the :loudspeaker: emoji. Also, :no_entry_sign: means you should leave that field *blank*.
:::
:::info
* **Project:** project_465001310
* **Partition:** small-g :loudspeaker:
#### Resources
* **Number of CPU cores:** 8
* **Memory (GiB):** 8
* **Number of GPUs:** 1
* **Time:** 4:00:00 :loudspeaker:
#### Settings
* **Working directory:** /scratch/project_465001310
* **Show advanced settings:** :heavy_check_mark:
* **Custom Python type:** container
* **Modules to load:** :no_entry_sign: :loudspeaker:
* **Path to container with Python:**
```
/scratch/project_465001310/env-deep-learning-intro/container.sif
```
* **Container arguments:** :no_entry_sign: :loudspeaker:
* **Init script for container:**
```
/scratch/project_465001310/env-deep-learning-intro/init_script.sh
```
* **Enable virtual environment:** :no_entry_sign:
* **Save settings**: Give it a name, like `deep-learning-intro`, so that you can use it later!
:::
==Launch Jupyter, and change to your directory==
The init script should have created one for you under the _scratch directory_ at `env-deep-learning-intro/workspace/$USER/deep-learning-intro/notebooks/lumi`
==Get the new notebooks!==
Either run `git pull` from the terminal or use the JupyterLab git interface.
- Navigate to the directory containing the notebooks
- Click on the left sidebar and then on the *cloud button with a down arrow*, as shown below

==How to use `TensorBoard` on LUMI==

---
:::danger
You can ask questions about the workshop content at the bottom of this page. We use the Zoom chat only for reporting Zoom problems and such.
:::
## Questions, answers and information
- Is this how to ask a question?
- Yes, and an answer will appear like so!
### ==3. Monitor the training process==
- I wonder if it is common to combine two optimizers, for example, use "adam" until the loss func. reaches a given threshold, and then switch to something better when one is closer to the minima.
- Optimizers are initialized when the training starts of, so it is not easy to switch in between. However, what one could do is choose the "learning rate" scheduler instead of a fixed one which is the default. Something like a cosine function is commonly used: "high learning rate" to start with and then reduce it progressively.
- Thank you.
- Also a side note, Adam optimizer is actually a combination of two older optimizers: RMSprop and Momementum. :+1:
- Why have we chosen 200 epochs to begin with?
- This is arbitary, and to demonstrate overfitting; but as you saw at the end of the lesson, using `EarlyStopping` manages to end the training before 200 epochs. This is the way to go.
- My question is: Do you aim always for the global minimum? Is the algorithm able to test the whole landscale or only a certain area around the current location? If we don't reach the global minimum, but only a local one, is this then not correct?
- the aim of training is to find the global minimum, but in reality it is difficult to reach the global minimum if the constructed model is too complex
- so our aim is trying to find the global minium, as much closer to the global minimum as possible
- Would you then perfom the search x times and average the result?
- this is not a good option. A better way is try to explore all parameters and try to get a "minimum", even this minimum is not the global minimum
- more explanations: in theory we want the global minimum, but in practice, we rarely reach the global minimum
- instead, we always settle for a local minimum or just some "good local minimum" points that can give good performance in the prediction
- How do you explore this in an automatic way? That is not so clear to me, and when do you stop with your search please?
- we always start at some points in the parameter space with optimization algorithms (GD, SGD, batch GD, *etc.*). following the gradient/slope of the loss function locally to descend to a local minimum nearby.
- then we do the test the model, if the set of parameters are not good, we will fine-tune model parameters until we get a good performance (very good feedback on the test dataset)
- there are some properties (like F1 value) to validate the evaluation.
- we can have lot of model parameters, and we always pick up the set that gives the lowest loss function as this set of parameters is the most promising one.
- Could you explain the term "baseline" in "baseline prediction"?
- baseline is the reference point for us to evaluate the trained model
- if the trained model outperforms the baseline (with small value for loss function), we can say that the trained model is acceptable
- if the trained model cannot pass the line setted by the baseline (that is why it is called **baseline**), the constructed model is not a good model
- then we have two options: abandon the constructed model, or try to improve the model with lots of strategies
- we provided three exercises at Step 9, you can explore each strategy to see how to improve the model
### ==4. Advanced layer types==
#### Number of features in Dollar Street 10
How many features does one image in the Dollar Street 10 dataset have?
- A. 64
- B. 4096
- C. 12288 +++++
- D. 878
- Building a CNN looks like it uses stencil computations, is it?
- yes, u r absolutely right.
- building and executing a CNN does involve stencil-like computations, particularly in the convolutional layers
- I've seen Google use images for their CAPCHA. Some of them have dark edges or are too saturated. Why black edges though? To confuse the model?
- dark edges (or high contrast) in CAPCHA can be deliberate design choice to confuse automated models, especially in CNN which is good at pattern recognition
- yes, black edges (or dark borders) is used to confuse the model
- the reason is that CNN is sensitive to edges and contrast
- Could you explain in more details the relation: Accuracy vs. Loss? Is it something similar to Bias vs. Variance?
- Accuracy is ~~inversely~~ indirectly related to the loss. Lower the loss, higher the accuracy. Of course there is no upper-bound for the loss. However, accuracy gives you a metric between 0 an 1. Does that make sense? :+1:
- Small correction. Accuracy is used for classification problems. See https://developers.google.com/machine-learning/crash-course/classification/accuracy-precision-recall
#### Network depth: Try it on your own!
https://enccs.github.io/deep-learning-intro/4-advanced-layer-types/#refine-the-model
### ==5. Transfer learning & Outlook==
Keras Applications
- https://keras.io/api/applications/
- https://huggingface.co/models
- https://paperswithcode.com/sota
which property to choose to validate the trained model?
- one is `balanced accuracy`

:::danger
*Always ask questions at the very bottom of this document, right **above** this.*
:::
---