[toc]
# Method

Three branches are target branch, sensitive attribute (SA) branch and cobtrastive branch.
1. Target branch is to predict skin conditions.
2. SA branch is to decouple skin-type information from representations.
3. Contrastive branch to enhance feature extration.
## Disentangled Representation Learning
The SA brabch consists of a classifier $f_s$ to predict the skin type $p^s$ baed on the representation $z$. Methods to obtain disentangled representation in fairness through blindness could be grouped into two.
1. One is adversarial learning that utilizes an adversarial loss such as cross-entropy for sensitive attributes on the SA branch and a gradient-reversal layer (GRL) between the feature extractor and the classifier.
2. Another direction is disentangled representation learning by using two different losses with GRL.
FairDisCo follows the second direction. It minimizes a confusion loss
$$
L_{conf}=-\sum_{i=1}^N\frac{1}{N}log(p_i^s)
$$
to confuse the feature extractor and remove the skin-type information from representations.
This loss minimized when the classifier outputs equal probability $p_i^s$ for all skin types $i$. Notice the classifer $f_s$ might learn a tricky solution like setting all the weights to zero, then $L_{conf}$ becomes a small value even though the representation still contains skin-type information. Thus, a skin-tupe predictive cross-entropy loss $L_s$ only optimizing $f_s$ is added.
## Contrastive Feature Extraction Enhancement
The contrastive loss is to promote intra-class cohesion and inter-class diversity to protect target features and imporve the representation learning. In the contrastive branch, representations is first projected from the feature extractor into a low dimensional latent space $r=H(z)$. For each embedding in the mini-batch, other embeddings with the same disease labels are split into a positive set $P_y$ and the rest are split into a negative set $N_y.
$$
L_{contr}=-\frac{1}{|P_y|}\sum_{p\in P_y}log\frac{exp(\Psi(r,p)/\tau)}{exp(\Psi(r,p)/\tau)+\sum_{n\in N_y}exp(\Psi(r,n)/\tau)}
$$
* $\Psi$: The cosine similarity between two vectors
* $\tau>0$: A temperature parameter
The final loss function for FairDisCo is
$$
L_{total}=L_c(\theta_\phi,\theta_{f_c})+\alpha L_{conf}(\theta_\phi,\phi_{f_s})+L_s(\theta_{f_s})+\beta L_{contr}(\phi_\phi,\theta_H)
$$
* $\alpha$ and $\beta$ are used to adjust contributions of confusion loss and contrastive loss.
**References**
* FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning