![](https://i.imgur.com/FICBHvt.png) ![](https://i.imgur.com/evE3GlG.png) # PTC PRACTICAL DEEP LEARNING **30.–31.8.2021** ## Lecturers: Markus Koskela: markus.koskela@csc.fi Mats Sjöberg: mats.sjoberg@csc.fi Katja Mankinen: katja.mankinen@csc.fi ## Links - Lecture slides: https://tinyurl.com/pdl-2021-08 - Exercises: https://github.com/csc-training/intro-to-dl/ - Day 1: https://github.com/csc-training/intro-to-dl/tree/master/day1 - Day 2: https://github.com/csc-training/intro-to-dl/tree/master/day2 - HackMD: - Program and lectures: https://hackmd.io/@pdl/HkXVVbAju - Day 1 exercises: https://hackmd.io/@pdl/Skmsh-0s_ - Day 2 exercises: https://hackmd.io/@pdl/rk4AnZAj_ - CSC Notebooks: https://notebooks.csc.fi/ - Zoom recordings (will be online for a limited time only, please don't share with non-participants): * Monday morning: https://a3s.fi/pdl-2021-08-bdbf6abc6e2d3fb26f1de69290c4eacf/zoom_0.mp4 * Monday afternoon (CNN): https://a3s.fi/pdl-2021-08-bdbf6abc6e2d3fb26f1de69290c4eacf/zoom_1.mp4 * Monday afternoon (RNN): https://a3s.fi/pdl-2021-08-bdbf6abc6e2d3fb26f1de69290c4eacf/zoom_2.mp4 * Tuesday morning part I: https://a3s.fi/pdl-2021-08-bdbf6abc6e2d3fb26f1de69290c4eacf/zoom_3.mp4 * Tuesday morning part II: https://a3s.fi/pdl-2021-08-bdbf6abc6e2d3fb26f1de69290c4eacf/zoom_4.mp4 * Tuesday afternoon: https://a3s.fi/pdl-2021-08-bdbf6abc6e2d3fb26f1de69290c4eacf/zoom_5.mp4 ## Program All times EEST (UTC+3) ### Day 1: Notebooks | Time | Event | | -------- | -------- | | 9:00-10:30 | **Lecture 1:** Introduction to deep learning (Markus) | 10:30-10:50| *Break* | 10:50-11:05| **Exercise 1:** Introduction to Notebooks, Keras fundamentals | | Jupyter notebook: 01-tf2-test-setup.ipynb | 11:05-11:35| **Lecture 2:** Multi-layer perceptron networks (Mats) | 11:35-12:00| **Exercise 2:** Classification with MLPs | | Jupyter notebook: 02-tf2-mnist-mlp.ipynb | | Optional: pytorch-mnist-mlp.ipynb | 12:00-13:00| *Lunch* | 13:00-14:00| **Lecture 3:** Image data, convolutional neural networks (Mats) | 14:00-14:30| **Exercise 3:** Image classification with CNNs | | Jupyter notebook: 03-tf2-mnist-cnn.ipynb | 14:30-14:45| *Break* | 14:45-15:30| **Lecture 4:** Text data, embeddings, recurrent neural networks (Markus) | 15:30-16:00| **Exercise 4:** Text sentiment classification with RNNs | | Jupyter notebooks: 04-tf2-imdb-rnn.ipynb | | Optional: tf2-imdb-cnn.ipynb, tf2-mnist-rnn.ipynb ### Day 2: Puhti | Time | Event | | -------- | -------- | | 9:00-10:15 | **Lecture 5:** Deep learning frameworks, GPUs and batch jobs (Mats) | 10:15-10:30| *Break* | 10:30-12:00| **Exercise 5:** Image classification: dogs vs. cats; traffic signs | 12:00-13:00| *Lunch* | 13:00-13:45| **Lecture 6:** Attention models (Katja) | 13:45-14:30| **Exercise 6:** Text categorization: 20 newsgroups | 14:30-14:50| *Break* | 14:50-15:30| **Lecture 7:** Using multiple GPUs (Markus) | 15:30-16:00| **Exercise 7:** Using multiple GPUs ## Questions and discussion Please write new questions below all existing discussions. * Ice breaker question: mention some cool, interesting or even controversial AI or machine learning application * LaMDA from Google * MUM * Github Copilot and OpenAI Codex * Face Recognition * Image recognition * Self Driving Cars * Bringing back the dead virtually (https://www.theverge.com/a/luka-artificial-intelligence-memorial-roman-mazurenko-bot_) * Object detection for autonomous robotic navigation * DeepFakes with GANs * AI and protein structure prediction - Alphafold2 * Cancer spread recognition * Natural Com*puting * Link: https://github.com/csc-training/intro-to-dl/tree/master/day1#day-1 * Do we have recordings of these sessions? * Is it possible to have recoding for only 2 week and then it deminish. As i am also doing side by side some MD simulation * OK, we will try to record lectures, and provide a temporary link afterwards. * Do we need to install any software? It would we don't need to as info provided * Mainly web browser. Ssh client needed for Tuesday. * Info about ssh-client (copied from the intro email everybody got): > MacOS and Linux computers should have the ssh command already available from a > terminal session. Recent versions of Windows may also include an SSH client. It > can be run as ”ssh” either in a PowerShell window or in a Command Prompt window. > Otherwise, you need to install and SSH client yourself. We recommend either > MobaXterm (https://mobaxterm.mobatek.net/) or PuTTY (https://www.putty.org/). * When using Deep Learning over large datasets, aren't big computing centers like CSC not bothered about the Energy consumption? If so, are you guys planning to use FPGAs to offload the deep learning tasks? * We have not planned to use FPGAs, but our new LUMI supercomputer has taken CO<sub>2</sub> emissions seriously, with 100% of the power produced with hydroelectricity. For more info see: https://www.lumi-supercomputer.eu/sustainable-future/ * How do you assess the generalization of a model? Will be exploring this on the practial sessions? * Generalization means that the model performs well also on *new* data that it hasn't seen before. This is typically assessed by leaving a *test set* of data completely outside of the training process. After training has finished, we test the model with this test set. If the model still performs well, we can se it generalizes well. * So it is done a posteriori always? * Yes, always in research for example. Reporting the accuracy in the training set is not considered very relevant - as we could trivially learn to recognize the training set items perfectly, but fail completely for new items (i.e., poor generalization). Using a separate test set is better. * Thanks! Understand the distinction of different type of datasets. New to adversarial sets though (first time I heard) * From my understanding, the reason we don't use the entire dataset to compute the gradient is the excessively large computation time required (Please correct me if I am wrong :)). In this sense, it is better sometimes to compute the gradient at each step using all the items in the dataset or to simply add more data (using mini batch) if we want to improve the performance of the model? * This is a complex topic. In practice it works better to have a small (but not too small) batch size, as it adds some randomness to the optimization process and you are not averaging over the whole dataset (which easily gets you stuck in a local minima). * Thank you very much! That makes total sense :) * What is the impact (in a real-live data set) of going to and fro from Python-Objects to a GPU and from a GPU? * I am using the same file but my resuly is different loss: 2.4697508811950684 acc: 0.09375 * I believe that is because of using the random function . We all might get different results. Thank you. * What is the bias exactly (wo)? * It is a weight that is not associated with any input, so it is a constant that is added to the weighted sum. In the linear classifier in the example it is the intercept. * I guess there is ways to choose/optimize bias value * The bias is a weight like any other weight, so it is optimized with gradient descent like the other weights. Initially all weights are typically random. If the forward propagation gives good prediction, do we still need to do back propagation? or its recommended to do backpropagation, so the reliability of system enhance ? * Backpropagation changes the network, so it is needed if/when you are training the networks. Forward propagation only computes the output of the network for the current input. So forward is done always, backprop only when you are training. Thank you so much. * Question : Is the backpropagation part of the model.fit command? Does it happen for each batch size or after each epoch? * Yes, the .fit() command includes the backpropagation step. In Keras it is hidden from view but in pytorch backprop is an explicit function that needs to be called. More about pytorch tomorrow. * And backprop happens after each minibatch has been processed, so there are several updates per epoch. * For visualizing confusion matrix: `from sklearn.metrics import ConfusionMatrixDisplay` `cm_display = ConfusionMatrixDisplay(cm).plot()` * Nice! Question: what decides the no of nurons in each layer we need to take ? * On hidden layers, the number of neurons is a hyperparameter that needs to be set by the user (although there are auto-ml approaches that optimize also hyperparameters, but that's out of the scope of this course (see e.g. https://autokeras.com/)) Question: For the 3-D image case, are the weights the same for each colour plane? Or can they be different? * The weights are different, so different colors can have different effects for the kernels. * Thank you! * Slide 15 shows some filters obtained for color images that illustrate colors in the kernels. * Small follow up question, the second layer which detects higher level features is now fully connected, right? Because the first level features can all be related to one another and there is no spatial relationship? * Got it, so there is a spatial relationship between these as well, right? Thank you! * Yes, spatial relatioships remain, although their importance is gradually reduced, this will be explained more in a bit. Question : Is there a way to find out the best accuracy or the best value of a hyperparameter without doing brute force method as we are doing now(changing the epochs, optimizer etc)? * Not directly, as the neural network training can only optimize things in the loss function which are differentiable (you can calculate partial derivatives). * Also check: https://autokeras.com/ Question : If we are using some sort of optimization algorithms(genetic optimization etc) to improve the accuracy or other parameters, what are the ways to use them? * Gradient descent is the standard way to optimize the loss function in training neural networks. There are other approaches as well (e.g. genetic algorithms, particle swarm optimization) but they are rather esoteric, and I am not very familiar with the current methods. Question: Will you share link to today's recordings in hackmd.io? * Sure. We need to convert the files and upload to the object storage, so it will take some time, but we'll post the links then. * Links to recordings can now be found at the top of this page Question: How much longer will the training accounts be activate for the participants after the course? * Expiration time is 01.09.2021 16:00. * The GPU reservation will end today at 18:00 however, so after that you need to use the standard queues. Question: How does this relate I/O-wise? if > 1000 images are processed, is the I/O sequential, only? * We'll discuss IO more in afternoon lectures, but usually several cpus are used for IO. In Puhti we have 10 CPUs per GPU, which can (and often should) be used for data preprocessing and IO. For Puhti. in how much time the batch id is generated after submitting job on puhti? As it depend on the number of user at that momnet but in general? Also does the new user get priority in queue line? * The job id is a number that increases every time a job is submitted. So 7394681 is the 7394681st job ever submitted to Puhti. * The queue priorities are calculated in rather delicate way and is a function of the resources requested, your previous usage of the system etc. See also https://docs.csc.fi/support/faq/when-will-my-job-run/ On puhti isnt the code for submitting batch job is "tail -f .log(file)" instead of .sh file? * "tail -f" is used to read a text file as it is generated, not to submit jobs. I am not aware of the ".log(file)" part, maybe it is part of some code that has been developed to submit and analyse jobs submitted to Puhti? * I agree, "less .log(file)" is for checking the progress. we use this for running Molecular dynamic simulation on puhti. Thank you so much. Can pytorch and tensorflow be used for regression tasks or they are just used for cnns and rnns? * Yes they can be used. We don't have any examples of regression on the course unfortunately. One example of regression with Keras and MLP can be found at https://github.com/CSCfi/machine-learning-scripts/blob/master/notebooks/tf2-chd-mlp.ipynb * And to clarify, CNNs and RNNs (as well as MLPs) can all be used for regression. The main thing is that the output layer of the network needs to be suitable for regression. Usually this means using mean squared error as the loss function and no nonlinear activations for the output layer.