# Neural Architecture Search 101
<br>
<br>
<small>Neil John D. Ortega</small>
---
### Agenda
- Motivation
- Taxonomy
- AutoML.org
- Recap
---
### Motivation
- Can we automate the discovery of **novel** neural architectures?
- Manual design - takes time, prone to errors, hard to systematize, introduces human bias
---
### Taxonomy
- Search space
- Search strategy
- Performance estimation strategy

<small><strong>Fig. 1.</strong> Relationships between NAS method categories [1]. Accessed 8 Nov 2020.</small>
---
## Search space
----
### Search space
- What architectures, in principle, can we represent?
- Layers, operations, connections, etc.
- Search space size vs. human bias

<small><strong>Fig. 1.</strong> Relationships between NAS method categories [1]. Accessed 8 Nov 2020.</small>
----
### Chain-structured networks
- Number of layers (could be unbounded)
- Type of operation for each layer (e.g. pooling, convolution, etc.)
- Hyperparameters for each operation (e.g. number of filters, kernel size, etc. for conv layer, etc.)
----
### Chain-structured networks

<small><strong>Fig. 2.</strong> A chain-structured network [1]. Accessed 9 Nov 2020.</small>
----
### Multi-branch networks
- Input to layer $i$ is a generic function of outputs of previous layers $0,...,i-1$
- e.g. ResNets, DenseNets
----
### Multi-branch networks

<small><strong>Fig. 3.</strong> A multi-branched neural network [1]. Accessed 9 Nov 2020.</small>
----
### Cell-based representation
- Cells or blocks
- "Mini-networks" as building blocks, instead of individual layers
- Cell architecture is learned
- Pros :+1:
- Drastically reduced search space size - cells usually consist of significantly less layers
- Easily adaptable to other data sets
- Repeated building blocks proved to be a useful design principle (e.g. LSTM cells, stacked ResNet blocks, etc.) :unamused:
- Useful in controlling granularity
----
### Cell-based representation
- Normal cells
- Preserves dimensionality of the input
- Reduction cells
- Reduces dimensionality of the input

<small><strong>Fig. 4.</strong> Stacked normal and reduction cell architectures for CIFAR-10 and ImageNet [3]. Accessed 9 Nov 2020.</small>
----
### Cell-based representation
- Macro- vs. micro-architecture
- Ideally, should be learned jointly
----
### Hierarchical structure
- Generalized version of the cell-based representation
- Most work used a fixed macro-architecture and optimized the repeated micro-architecture

<small><strong>Fig. 5.</strong> An example of a three-level hierarchical structure [2]. Accessed 11 Nov 2020.</small>
---
## Search strategy
----
### Search strategy
- How do we explore the search space?
- Exploration vs. exploitation

<small><strong>Fig. 1.</strong> Relationships between NAS method categories [1]. Accessed 8 Nov 2020.</small>
----
### Random search
- Most naïve baseline
----
### Reinforcement learning (RL)
- Action space - list of hyperparameters ("tokens") generated by the controller used for defining a child network
- Reward - validation accuracy of a child network
- Loss - optimize the controller parameters $\theta$ with some loss function (e.g. REINFORCE, etc.)

<small><strong>Fig. 6.</strong> Overview of RL-based NAS [4]. Accessed 11 Nov 2020.</small>
----
### Reinforcement learning (RL)

<small><strong>Fig. 7.</strong> How a controller (RNN) is used to generate convolution layers [4]. Accessed 11 Nov 2020.</small>
----
### Evolutionary algorithms
- "Genes" encoding the information to create a network (e.g. connection weights, topology, etc.)
- Best results when used to determine the architecture only and not the weights
----
### Evolutionary algorithms
- Grow population with:
- Parents genes are the ones with the highest accuracy in every iteration
----
### Evolutionary algorithms
- Grow population with:
- Introducing mutations on the genes (i.e. modifying the weights, connections, etc.)

<small><strong>Fig. 8.</strong> Network architecture mutations in NEAT [5]. Accessed 11 Nov 2020.</small>
----
### Evolutionary algorithms
- Grow population with:
- Cross parent genes to create "offsprings"

<small><strong>Fig. 9.</strong> Offspring networks in NEAT [5]. Accessed 11 Nov 2020.</small>
----
### Gradient descent
- Possible but involves converting the discrete search space into a differentiable one (how do you make "adding a layer" differentiable?)
- Typically done with joint learning of architecture parameters and network weights
---
## Performance estimation strategy
----
### Performance estimation strategy
- How do we estimate the predictive performance on test data?
- Standard training and validation gives correct performance value but is computationally expensive

<small><strong>Fig. 1.</strong> Relationships between NAS method categories [1]. Accessed 8 Nov 2020.</small>
----
### Train from scratch
- Trains every child network independently (hopefully, in parallel) from scratch until *convergence*, then measure validation accuracy
- Computationally expensive (~1000 GPU days! :scream:)
----
### Lower fidelity estimates
- Train on a smaller dataset
- Train on fewer epochs
- Train and evaluate a downsized model during search stage, etc.
----
### Learning curve extrapolation
1. Train with just a few epochs
2. Model the learning curve (of the child models) as a time-series regression problem
3. Extrapolate using the model
----
### Weight inheritance
- Uses a parent model as a warm start for new child models
- Saves a lot of GPU compute especially with even more aggressive weight sharing
- Sampled child models can be views as *subgraphs* within the parent *supergraph*
----
### One-shot models
- Only a single model needs to be trained
- Weights are then shared across child networks that are just subgraphs of the one-shot model
- Uses gradient descent for joint bilevel optimization (optimizes both architecture and weights)
----
### One-shot models

<small><strong>Fig. 10.</strong> Simplified overview of one-shot architecture search [1]. Accessed 12 Nov 2020.</small>
---
### AutoML.org

<small><strong>Fig. 11.</strong> AutoML.org focuses on the progressive and sytematic automation of machine learning.[7]. Accessed 12 Nov 2020.</small>
----
### AutoML.org
- Research focus on optimizing and automating ML
- Hyperparameter optimization
- Neural architecture search
- Meta-learning
- Released several [NAS benchmarks](https://www.automl.org/nas-4/nasbench/) to organize all the results seen so far and to help guide future research
- Released *"a very early pre-alpha version"* of [Auto-PyTorch](https://github.com/automl/Auto-PyTorch)
- Has support for image classification
---
### Recap
<style>
.reveal ul {font-size: 32px !important;}
</style>
- Neural architecture search aims to automate, systematize the discovery of **novel** neural architectures
- Approaches can be clasified according to:
- Search space
- Search strategy
- Performance estimation strategy
- [AutoML.org](https://automl.org) is a good place to start if you want to learn more!
---
# Thank you! :nerd_face:
---
### References
<!-- .slide: data-id="references" -->
<style>
.reveal p {font-size: 20px !important;}
.reveal ul, .reveal ol {
display: block !important;
font-size: 32px !important;
}
section[data-id="references"] p {
text-align: left !important;
}
</style>
[1] Elsken, Thomas et al. ["Neural Architecture Search: A Survey."](https://arxiv.org/abs/1808.05377) ArXiv abs/1808.05377 (2019).
[2] Liu, Hanxiao et al. ["Hierarchical Representations for Efficient Architecture Search."](https://arxiv.org/abs/1711.00436) ArXiv abs/1711.00436 (2018).
[3] Zoph, Barret et al. ["Learning Transferable Architectures for Scalable Image Recognition."](https://arxiv.org/abs/1707.07012) 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018).
[4] Zoph, Barret and Quoc V. Le. ["Neural Architecture Search with Reinforcement Learning."](https://arxiv.org/abs/1611.01578) ArXiv abs/1611.01578 (2017).
[5] Stanley, Kenneth and Risto Miikkulainen. ["Evolving Neural Networks through Augmenting Topologies"](http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf) Evolutionary Computation 10(2): 99-127 (2002).
[6] Liu, Hanxiao et al. ["DARTS: Differentiable Architecture Search."](https://arxiv.org/pdf/1806.09055.pdf) ArXiv abs/1806.09055 (2019).
[7] [AutoML.org](https://www.automl.org/)
{"metaMigratedAt":"2023-06-15T15:26:39.710Z","metaMigratedFrom":"YAML","title":"Neural Architecture Search 101","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"spotlight\":{\"enabled\":true}}","contributors":"[{\"id\":\"ed2adf4d-7b64-4cc8-9c2f-656c184d7122\",\"add\":19389,\"del\":9716}]"}