# Notes 2023-02-16
## <u>Predicting the change in sigmoid-outputs</u>
<!--

-->

## <u>Predicting the change in logits</u>
For the logits, I am plotting:
$∣f_{w_{*}} - f_{w_{-i}}∣ = ∣\alpha_{-i}* v_i∣$
Note that $\alpha_{-i}$ are the leave-one-out residuals of the neural network, unlike in the equation for sigmoid-outputs!
See also the old writeup:

## <u>New result (australian-scale)</u>
Single hidden layer MLP, 100 neurons, ~2000 parameters
Train acc 0.875, test acc 0.878
train_config = {'delta': 1e1, 'n_epochs': 1000, 'lr': 1e-2, 'bs_train': 32}
retrain_config = {'n_epochs': 5, 'lr': 1e-2, 'bs_retrain': 32}
Changes to before:
- $w_*$ trained for more epochs
- larger regularizer ("banana-shaped" memory maps)
- nothing else changed in the code
### Memory map

### Change in logits

### Change in sigmoid-outputs: Equation 13, 15/16, 17
**Equation 13**

**Equation 15 = Equation 16**

**Equation 17**

### Change in sigmoid-outputs: Approximations using only components of Eq. 17
**lambda**

**alpha**

**lambda x alpha**

<!--
## <u>New result (USPS)</u>
binary USPS
CNN, ~5000 parameters
-->
## <u>Conclusion & To Do's</u>
**Conclusion about old results**
- Some of my old experiments maybe were "degenerate"
- too small regularizer, L-shaped memory map, overfitting
- base model $w_{*}$ not trained for long enough
**Conclusion about new results**
- $∣h(f_{w_{*}}) - h(f_{w_{-i}})∣ = ∣\alpha_i * \lambda_i * v_i∣$ can also hold for large regularizer
- $∣f_{w_{*}} - f_{w_{-i}}∣ = ∣\alpha_{-i}* v_i∣$ is also a valid linear-relationship?
- base model $w_{*}$ needs to be optimized until convergence!
- approximations (aim: predicting $∣h(f_{w_{*}}) - h(f_{w_{-i}})∣$)
- for a well-regularized and properly optimized model: $∣\alpha_i * \lambda_i * v_i∣$ > $\lambda_i * \alpha_i$ > $\lambda_i$ and $\alpha_i$
- approximation-error due to GGN (for larger residuals) seems to be not that huge in this case?
**To Do's**
- Show on more and large model/data