### Autoencoder Components
**This is part of Neuraldemy's [Tutorial On Autoencoders](https://neuraldemy.com/autoencoders-fundamentals-of-encoders-and-decoders/)**
1. **Encoder Function:**
The encoder maps the input $\mathbf{x}$ to a latent representation $\mathbf{z}$.
$$\mathbf{z} = f(\mathbf{x}) = \sigma(\mathbf{W}_e \mathbf{x} + \mathbf{b}_e)$$
- $\mathbf{x}$: Input data
- $\mathbf{z}$: Latent representation
- $\mathbf{W}_e$: Encoder weight matrix
- $\mathbf{b}_e$: Encoder bias vector
- $f$: Encoder function
- $\sigma$: Activation function (e.g., ReLU, sigmoid)
2. **Decoder Function:**
The decoder reconstructs the input from the latent representation $\mathbf{z}$.
$$\hat{\mathbf{x}} = g(\mathbf{z}) = \sigma(\mathbf{W}_d \mathbf{z} + \mathbf{b}_d)$$
- $\hat{\mathbf{x}}$: Reconstructed output
- $\mathbf{W}_d$: Decoder weight matrix
- $\mathbf{b}_d$: Decoder bias vector
- $g$: Decoder function
3. **Reconstruction Loss:**
The reconstruction loss measures the difference between the input $\mathbf{x}$ and the reconstructed output $\hat{\mathbf{x}}$.
For Mean Squared Error (MSE):
$$L(\mathbf{x}, \hat{\mathbf{x}}) = \|\mathbf{x} - \hat{\mathbf{x}}\|^2$$
For Binary Cross-Entropy:
$$L(\mathbf{x}, \hat{\mathbf{x}}) = -\sum_{i=1}^{n} \left[ x_i \log(\hat{x}_i) + (1 - x_i) \log(1 - \hat{x}_i) \right]$$
- $L(\mathbf{x}, \hat{\mathbf{x}})$: Reconstruction loss
- $\|\mathbf{x} - \hat{\mathbf{x}}\|^2$: Squared Euclidean distance between $\mathbf{x}$ and $\hat{\mathbf{x}}$
- $x_i$: $i$-th element of the input
- $\hat{x}_i$: $i$-th element of the reconstructed output
- $n$: Number of elements in the input
4. **Total Loss (with regularization):**
The total loss includes the reconstruction loss and a regularization term $R(\theta)$.
$$L_{\text{total}} = \sum_{i=1}^{N} L(\mathbf{x}^{(i)}, \hat{\mathbf{x}}^{(i)}) + \lambda R(\theta)$$
- $L_{\text{total}}$: Total loss
- $N$: Number of training samples
- $\lambda$: Regularization parameter
- $R(\theta)$: Regularization term (e.g., weight decay)
5. **Training Objective:**
The objective is to minimize the total loss function with respect to the parameters $\theta$.
$$\min_{\theta} \sum_{i=1}^{N} L(\mathbf{x}^{(i)}, \hat{\mathbf{x}}^{(i)}) + \lambda R(\theta)$$
- $\theta$: Model parameters (weights and biases)