# Layer Sweep ## Metric $$ completeness_{layer} = \frac{p_{layer}(object) - p_{-1}(object)}{p_{-1}(object)}$$ $$ contribution_{layer} = \frac{p_{layer}(object) - p_{layer - 1}(object)}{p_{-1}(object)}$$ restoring $h_{subject}$ at different layers in a corrupted run (different subject), but only at the `subject_last` position. $$ causal\_score_{layer} = \frac{p(answer | h_{subject}^{layer} \text{ restored on corrupted run}) - p(answer | \text{clean run})}{p(answer | \text{clean run})}$$ | completeness | contribution | causal_score | faithfulness | -------- | -------- | -------- | -------- |![](https://hackmd.io/_uploads/SkrcPhpN3.png)|![](https://hackmd.io/_uploads/S1Div26N2.png)|![](https://hackmd.io/_uploads/rJlTDnaNn.png)|![](https://hackmd.io/_uploads/ByqkO36Nn.png)| ![](https://hackmd.io/_uploads/rksBunp4n.png)|![](https://hackmd.io/_uploads/HyCIOhaE3.png)|![](https://hackmd.io/_uploads/B18vdhT4n.png)| ![](https://hackmd.io/_uploads/SkRPd2TN3.png)| ![](https://hackmd.io/_uploads/S1YoOhTN3.png)|![](https://hackmd.io/_uploads/rkghd3aEh.png)|![](https://hackmd.io/_uploads/SyOhO2aNn.png)| ![](https://hackmd.io/_uploads/rkJ6d3p42.png)| ![](https://hackmd.io/_uploads/r1T1YhpV3.png)|![](https://hackmd.io/_uploads/B1PgK2TNn.png)|![](https://hackmd.io/_uploads/ryMZY2T43.png)|![](https://hackmd.io/_uploads/S1qZF36E3.png)| ![](https://hackmd.io/_uploads/H1hEtn6Vn.png)|![](https://hackmd.io/_uploads/H1LSFhaV3.png)|![](https://hackmd.io/_uploads/rJRrt3643.png)|![](https://hackmd.io/_uploads/H1ULKhaN3.png)| ![](https://hackmd.io/_uploads/BkCqYn6En.png)|![](https://hackmd.io/_uploads/BJ_oYhaEn.png)|![](https://hackmd.io/_uploads/HkAsF26E3.png)|![](https://hackmd.io/_uploads/SymnY3TVn.png)| ![](https://hackmd.io/_uploads/Sk_PgRa43.png)|![](https://hackmd.io/_uploads/r11_xApV2.png)|![](https://hackmd.io/_uploads/rkruxC64n.png)|![](https://hackmd.io/_uploads/BJideC642.png)| ![](https://hackmd.io/_uploads/SkKRx5ANn.png)|![](https://hackmd.io/_uploads/HJCCe5RV3.png)|![](https://hackmd.io/_uploads/BJB1b504n.png)|![](https://hackmd.io/_uploads/Sk7WbcAN3.png)| <!-- relation = `The capital of {} is` Each calculated with 3 icl examples $$ completeness_{layer} = \frac{p_{layer}(object) - p_{-1}(object)}{p_{-1}(object)}$$ ![](https://hackmd.io/_uploads/BkeBdztVn.png) --- $$ contribution_{layer} = \frac{p_{layer}(object) - p_{layer - 1}(object)}{p_{-1}(object)}$$ ![](https://hackmd.io/_uploads/SkKgmddNn.png) | J_norm | Jh_norm | | -------- | -------- | |![](https://hackmd.io/_uploads/BkWMJYdE2.png)|![](https://hackmd.io/_uploads/rJxL7yKOE3.png) | --- ## Causal Tracing ![](https://hackmd.io/_uploads/Sy74QlsN3.png) ## ICL-Mean estimation performance at different layers ![](https://hackmd.io/_uploads/H1AdJiKV3.png) -->