{%hackmd theme-dark %} Residual from one pixel for measurement over $K$ polarization states is: $$R = \sum_{j=0}^{K-1} (H_jM_{N-1} \ldots M_i \ldots M_0 G_j S - D_j) \tag{1}$$ where S is a $4 \times 1$ matrix containing unploarized light ($S = \begin{bmatrix} 1 && 0 && 0 && 0 \end{bmatrix}^T$), and $G_j$, $H_j$ are the Mueller matrices corresponding to each set of PEM and polarizer. To be optimized are the coefficients defining the $N$ Mueller matrices $M_i$. Consider one Mueller matrix $M_i$. We can rewrite the expression for its residual (1) for one polarization as $$R_j = H_j (O M L)_i G_j S - D_j \tag{2}$$ where $L_i$ is the total state change due to variable matrices _before_ $M_i$, and $O_i$ is the total state change after $M_i$. In other words: $$ L_i = M_{i-1} \ldots M_0 \\ O_i = M_{N-1} \ldots M_{i + 1} $$ Now, it is easier to work out the following if we change our formulation of the problem a bit by combining all the polarizations $D_j$ into a Mueller matrix, and treat the related problem $$\rho = O_i M_i L_i - \mu \tag{3}$$ We compute the scalar-matrix derivative using some norm (writing $M$ for $M_i$ for simplicity, and $r(\rho) = \Vert \rho \Vert^2$), using a theorem of matrix calculus (I will be lazy and use $d$ instead of $\partial$): $$\bigg(\frac{d r}{d M}\bigg)_{ij} = \text{tr}(\frac{dr(\rho)}{d\rho}\begin{bmatrix} \dfrac{d\rho_{00}}{dM_{ij}} && \ldots && \dfrac{d\rho_{03}}{dM_{ij}} \\ \vdots && \ddots && \vdots \\ \dfrac{d\rho_{30}}{dM_{ij}} && \ldots && \dfrac{d\rho_{33}}{dM_{ij}} \end{bmatrix} \tag{4})$$ Now the first derivative is easy if we use e.g. the squared Frobenius norm, it simply turns out to evaluate to $2\rho^T$ by the rule $\dfrac{dX^TAX}{dX} = X^T(A + A^T)$ with the identity matrix as $A$. Now we are left with $$\bigg(\frac{d r}{d M}\bigg)_{ij} = \text{tr}(2 \rho^T \dfrac{d\rho}{dM_{ij}})$$ Now we simply need to apply several matrix-by-scalar identities, $$\dfrac{d\rho}{dM_{ij}} = \dfrac{d(OML)}{dM_{ij}} - \dfrac{d(\mu)}{dM_{ij}}$$ $\mu$ is not a function of $M$ and thus vanishes, so we have $$\dfrac{d\rho}{dM_{ij}} = \dfrac{d(OML)}{dM_{ij}} = O\dfrac{dM}{dM_{ij}}L$$ The partial derivative is simply the matrix unit $E_{ij}$ (the matrix which is zero everywhere, except for entry $ij$. Applying a trace identity $$\bigg(\frac{d r}{d M}\bigg)_{ij} = \text{tr}(2 \rho^T O E_{ij} L) = 2\sum_a \sum_b (\rho_{ab} (OE_{ij}L)_{ab}) \tag{5} $$ Thus, given the residual matrix $\rho$, we may calculate each entry $ij$ of its gradient with respect to each matrix $M_i$ by finding the initial and final matrices $O$ and $L$, and then evaluating (5) for each matrix unit $E_{ij}$. The gradient with respect to the underlying representation (e.g., an index ellipsoid) is then simply given by inverting the relationship between Mueller matrix elements and the underlying representation and then using the chain rule. E.g., if we have $$M_{11} = f(n)$$ and no other elements depend on $n$, then $$\frac{dr}{dn} = \bigg(\frac{dr}{dM}\bigg)_{11} \frac{dM_{11}}{dn}$$ We can potentially simplifying the calculation by directly inserting whatever the derivative of $M_i$ with respect to each component of the underlying representation is in (5), instead of $E_{ij}$.