Meeting Minute
===
###### tags: `Meeting`
:::info
- **Date:** July 1st, 2021 4:00 PM (PST)
- **Agenda**
1. Zero initialization `90 minites`
- **Presenter:** Jiawei Zhao
:::
# Intro
## Motivations
The goal is not well defined. Need a better answer to why zero initialization is better.
**Motivation 1**
1. Example on linear model is far from real networks.
2. Not clear why a solution with minimum L2 norm is better than others (eg. Bregman divergence)
**Motivation 2**
1. Residual network formulation in the theorem is linear model, which is still far from reality.
2. The theorem says the global minimizer goes to zero as network is deeper. But why should we do that? Why should we use ultra deep network and even have our global minimizer close to zero?
3. The wording "Symmetric learning problem" is confusing.(Break the symmetricity?)
7. Batchnorm-free doesn't seem to be good selling point. The proposed block has a additional scalar factor and two bias terms, which is
> Aside: Would be good to have someone present normalization techniques inclduing instance norm, layer norm, group norm.
## Experiments
## Future work
1. Add experiments on different depths of ResNet.
2. Network morphism on continual learning, seamlessly adding network modules. Maybe as a new project.
3. May not need experiment on ultra deep network.
4. Multinode setting (Tensor people: [name=John, Pau Springer])
5. Try language models. (Experts on transformers: [name=Zhiding, Chaowei])