INTRODUCTION TO DEEP LEARNING

![](https://i.imgur.com/FICBHvt.png) ![](https://i.imgur.com/evE3GlG.png) [**PRACE AUTUMN SCHOOL 2021**](https://events.prace-ri.eu/event/1188/) # INTRODUCTION TO DEEP LEARNING **12.–15.10.2021** Vuokatti, Finland :::danger :bell: **Note:** ::: --- ## :busts_in_silhouette: Lecturers Markus Koskela: markus.koskela@csc.fi Mats Sjöberg: mats.sjoberg@csc.fi --- ## :link: Links - Lecture slides: https://tinyurl.com/pdl-2021-autumn - Exercises: https://github.com/csc-training/intro-to-dl/tree/vuokatti2021 - Notebooks: https://github.com/csc-training/intro-to-dl/tree/vuokatti2021/notebooks - Cluster: https://github.com/csc-training/intro-to-dl/tree/vuokatti2021/slurm - HackMD: - Program and lectures: https://hackmd.io/@pdl/Skn7I48MY - Notebooks exercises: https://hackmd.io/@pdl/Sy52T1s4F - Cluster exercises: https://hackmd.io/@pdl/rJMw1lsNF - CSC Notebooks: https://notebooks.csc.fi/ --- ## :calendar: Program The detailed program of the "Introduction to deep learning" hands-on track can be found below. The program of the entire Autumn school is available at https://events.prace-ri.eu/event/1188/. All times EEST (UTC+3). ### Day 1: Tuesday Oct 12 :::spoiler | Time | Event | | -------- | -------- | | 14:00-15:30| **Lecture 1:** Introduction to deep learning (Markus) | 15:30-15:45| *Break* | 15:45-16:00| **Notebooks Exercise 1:** Introduction to Notebooks, Keras fundamentals | | Jupyter notebook: *01-tf2-test-setup.ipynb* | 16:00-16:30| **Lecture 2:** Multi-layer perceptron networks (Mats) | 16:30-17:00| **Notebooks Exercise 2:** Classification with MLPs | | Jupyter notebook: *02-tf2-mnist-mlp.ipynb* | | Optional: *pytorch-mnist-mlp.ipynb, tf2-chd-mlp.ipynb* | 17:00-18:00| *Dinner* | 18:00-19:00| **Lecture 3:** GPUs, CSC's Supercomputers (Mats) | 19:00-21:00| **Cluster Exercise 1:** MLP hyperparameter optimization ::: ### Day 2: Wednesday Oct 13 :::spoiler | Time | Event | | -------- | -------- | | 14:00-15:00| **Lecture 4:** Image data, convolutional neural networks (Mats) | 15:00-15:30| **Notebooks Exercise 3:** Image classification with CNNs | | Jupyter notebook: *03-tf2-mnist-cnn.ipynb* | 15:30-15:45| *Break* | 15:45-16:30| **Lecture 5:** Text data, embeddings, recurrent neural networks (Markus) | 16:30-17:00| **Notebooks Exercise 4:** Text sentiment classification with RNNs | | Jupyter notebooks: *04-tf2-imdb-rnn.ipynb* | | Optional: *tf2-imdb-cnn.ipynb, tf2-mnist-rnn.ipynb* | 17:00-18:00| *Dinner* | 18:00-21:00| **Lecture 6:** CNN architectures and applications (Mats) | | **Cluster Exercise 2:** Image classification | | **Cluster Exercise 3:** Text categorization ::: ### Day 3: Thursday Oct 14 :::spoiler | Time | Event | | -------- | -------- | | 9:00-10:00 | **Lecture 7:** Using multiple GPUs (Markus) | 10:00-10:30| **Cluster Exercise 4:** Using multiple GPUs | 10:30-11:30| **Lecture 8:** Attention models (Markus) | 11:30-12:00| **Cluster Exercise 5:** Text categorization with BERT ::: ### Day 4: Friday Oct 15 :::spoiler | Time | Event | | -------- | -------- | | 9:00-10:00 | **Lecture 9:** AutoML (Mats) | 10:00-11:00| **Cluster Exercise 6:** Hyperparameter optimization with Ray Tune ::: --- ## :interrobang: Questions and discussion :::info :pencil: Please add new questions and topics below all existing discussions. ::: * Warm-up question: What do you think is the most exciting area in deep learning or artificial intelligence? * BERT * Drug discovery * So-called general AI * Deep learning * Games like chess * What is a good architecture for time-series and data with high spatial dependence? * Multilayer perceptrons (MLPs) may be? * Architecture for what exactly? Storing the data? Like TimescaleDB perhaps? * Tensor Flow sounds like a tool, not any specific ML archtecture for modeling data. * Data that are arranged on the plane or in the space, e.g., nodes in finite element technology - when we want to use them as the model input e.g. for surrogate model we know that there is a strong correspondence between nodes in some neighborhood. * Often recurrent neural networks (RNNs) are used for time-series, Convolutional neural networks (CNNs) for spatial dependencies. Both will be covered in the lectures tomorrow. * Nice :) thanks. * Is there any well-defined heuristic to find the best compromise between performance and overfitting? (I guess we will discuss that later) * No ;-) * Deep learning is very powerful for huge amount of data, if you have not huge amount of data you should consider other method. How much sample is considered to be huge amount of data and how much sample is considered little? Is there any way to evaluate it or is there any cutoff value? * There is no exact answer to that question. It depends so much on how difficult the learning problem is. Learning general image features probably needs at least 10s of thousands of images (probably more). But in that case one can often use pre-trained models which have been trained or different but similar data (more on this tomorrow). So it really depends on the situation :-) * How to process non-imgae based data? * Any data that can be expressed as real valued vectors or tensors can be processed. For example text can be transformed to real numbers, this will be covered tomorrow. Is it possible to give different types of inputs for same problem e.g., Cat vs Dog data if we give the images as well as voice of these animals. How we can test these inputs for differentiating them? * How do we get access to the cluster? * How long do we have access to the cluster? * Training account expiration time is 15.10.2021 22:00. The gpu reservation is until 15.10.2021 12:00 (noon). * Examples of loading and saving Keras models can be found in `slurm/tf2-dvc-cnn-simple.py` and `slurm/tf2-dvc-cnn-evaluate.py`, covered on Wednesday * What is the best practice when tuning hyperparameters with validation accuracy? For example with MNIST, should we use the test set, or should we divide the training set in a training set and a validation set, and keep the test set for final validation? * Yes, we should split the training set and use a portion of it for validation. We will be doing exactly that in later exercises. * More general question: does hyperparameters fine-tuning (based on validation accuracy) indirectly lead to overfitting the validation data? * Yes, that is a possibility, and the danger of that is increased when you keep on trying different models and hyperparameter values. Therefore you cannot fully trust the validation results and that's why we have a separate test set that is used only seldom. * Some examples on how to deploy ML models on Rahti: https://github.com/cscfi/rahti-ml-examples