Autoencoders - HackMD

### Autoencoder Components **This is part of Neuraldemy's [Tutorial On Autoencoders](https://neuraldemy.com/autoencoders-fundamentals-of-encoders-and-decoders/)** 1. **Encoder Function:** The encoder maps the input $\mathbf{x}$ to a latent representation $\mathbf{z}$. $$\mathbf{z} = f(\mathbf{x}) = \sigma(\mathbf{W}_e \mathbf{x} + \mathbf{b}_e)$$ - $\mathbf{x}$: Input data - $\mathbf{z}$: Latent representation - $\mathbf{W}_e$: Encoder weight matrix - $\mathbf{b}_e$: Encoder bias vector - $f$: Encoder function - $\sigma$: Activation function (e.g., ReLU, sigmoid) 2. **Decoder Function:** The decoder reconstructs the input from the latent representation $\mathbf{z}$. $$\hat{\mathbf{x}} = g(\mathbf{z}) = \sigma(\mathbf{W}_d \mathbf{z} + \mathbf{b}_d)$$ - $\hat{\mathbf{x}}$: Reconstructed output - $\mathbf{W}_d$: Decoder weight matrix - $\mathbf{b}_d$: Decoder bias vector - $g$: Decoder function 3. **Reconstruction Loss:** The reconstruction loss measures the difference between the input $\mathbf{x}$ and the reconstructed output $\hat{\mathbf{x}}$. For Mean Squared Error (MSE): $$L(\mathbf{x}, \hat{\mathbf{x}}) = \|\mathbf{x} - \hat{\mathbf{x}}\|^2$$ For Binary Cross-Entropy: $$L(\mathbf{x}, \hat{\mathbf{x}}) = -\sum_{i=1}^{n} \left[ x_i \log(\hat{x}_i) + (1 - x_i) \log(1 - \hat{x}_i) \right]$$ - $L(\mathbf{x}, \hat{\mathbf{x}})$: Reconstruction loss - $\|\mathbf{x} - \hat{\mathbf{x}}\|^2$: Squared Euclidean distance between $\mathbf{x}$ and $\hat{\mathbf{x}}$ - $x_i$: $i$-th element of the input - $\hat{x}_i$: $i$-th element of the reconstructed output - $n$: Number of elements in the input 4. **Total Loss (with regularization):** The total loss includes the reconstruction loss and a regularization term $R(\theta)$. $$L_{\text{total}} = \sum_{i=1}^{N} L(\mathbf{x}^{(i)}, \hat{\mathbf{x}}^{(i)}) + \lambda R(\theta)$$ - $L_{\text{total}}$: Total loss - $N$: Number of training samples - $\lambda$: Regularization parameter - $R(\theta)$: Regularization term (e.g., weight decay) 5. **Training Objective:** The objective is to minimize the total loss function with respect to the parameters $\theta$. $$\min_{\theta} \sum_{i=1}^{N} L(\mathbf{x}^{(i)}, \hat{\mathbf{x}}^{(i)}) + \lambda R(\theta)$$ - $\theta$: Model parameters (weights and biases)