[Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?](https://arxiv.org/pdf/2011.09699.pdf)

# [Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators?](https://arxiv.org/pdf/2011.09699.pdf) The authors analyse the latent space of StyleGAN and propose a manipulation method on the Stylespace termed as *S* using interpolation and specific loss function for task of face attribute editing. **Latent Space Analysis:** - Unsupervised approaches use classical ML techniquies like PCA to solve latent directions and interpret semantic meanings for generation. - Supervised methods solve for the manipulated latent vector with the supervision of semantic labels. **Face Atrribute Editing(FAE):** - FAE ams at manipulating target attribute of input image wile keeping irrelevant content intact. - Most methods use new networks or lss functions. - To reduce computation required the paper proposes to control their behaviour by manipulating the latent codes. ### Style Space Translation: - The translation done here uses feature maps and components required are marked as c and not required as c' - To edit precisely semantic of c the author states to use a separating hyperplane with normal vector nα, with displacemrnt vector of style channel(s) following certain conditions. ![](https://i.imgur.com/mC0R394.png) - This means interpolating along deltas would yield in desired output. - ∆S^ is approximated using ∆sn a normal vector on S,which classifies style codes according to labels of the translated attribute α. - Sparsity regularization us imposed on this to meet the specifed conditions. ### Style Intervention: - ∆sn contains minimum information to translate the input thus as we translate Z(z → z'), it is plausible to the same for corresponding displacement vector. - ∆sz = f(z)-f(z') - To combine translation in Z which produces realistic semantic changes but with entanglement and S which exerts precise modifications but not photo-realistic a intervention coefficient lambda is used which various losses. ![](https://i.imgur.com/cKfKWIX.png) **Pixel-level Loss:** - Λ is penalised for modifying image other than c using a binary mask. ![](https://i.imgur.com/hUZaPUN.png) **Attribute Loss:** - To ensure image shows target change atrribute loss is take between Sn and Sm as cosine loss. ![](https://i.imgur.com/5B51Pvg.png) **L2 Loss:** - To make sure image does not deviate from generation manifold L2 loss is used on lamda. The overall loss function is: ![](https://i.imgur.com/aUSX9kh.png) ### Separability of stylespace - The model is trainedon Face++ and it is found that most of facial attributes are linearly separable. - This indicates stylecodes contain semantic information of image attributes ![](https://i.imgur.com/xb9afUS.png) ### Spatially Disentangled Image Translation - The authors fix the lamda*(optimal lambda) and change coefficients of ∆sn to check of translation could be interpolated for only a target area. - The output shows translating in ∆sn only affects thetargeted content. ![](https://i.imgur.com/g3EiwuH.png) - The intermediate generation is also visualised and it is observed changes between succesive layers is discrete and monotonous. - This implies that facial component is treated as a whole and embedded in neigbouring feature map instead of collection of texture patches in successive layers. ![](https://i.imgur.com/UpDPwBB.png)