**Missing Data Imputation using Generative Adversarial Nets**

###### tags: Paper Reading # **Missing Data Imputation using Generative Adversarial Nets** ## 大綱        此篇論文是在講述如何使用GAIN對消失的特徵估算。 ## 動機/相關知識        Feature missing 在實際處理解決問題的時候很常發生，有時可能是來自於數據的丟失、有的時候可能是來自數據本身就很少。所以一個能夠找出missing feature的方法是必須的。而這方法不只可以應用在找出缺少得特徵，而同樣的可以應用在其他領域像是，數據壓縮、圖像混和等等。        而數據缺失又可以分三類: MCAR、MAR、MNAR，而此篇則是以MCAR這種所提出的方法。而填補的方法就又可以分為 * discriminative methods **(判別式)** * MICE (Buuren & Oudshoorn, 2000; Buuren& Groothuis-Oudshoorn, 2011) * MissForest (Stekhoven &B¨uhlmann, 2011) * matrix completion (Mazumder et al.,2010a; Yu et al., 2016; Schnabel et al., 2016; Mazumderet al., 2010b) * generative methods **(生成式)** * based on Expectation Maximization (Garc´ıa-Laencina et al., 2010) * based on deep learning (e.g. denoising autoencoders(DAE) * based on generative adversarial nets (GAN))(Vincent et al., 2008; Gondara & Wang, 2017; Allen & Li,2016) ## 模型相關        接著作者指出了上述生成式填補算法的缺點(參閱 section.1 第三段 )，然後才產生了此篇基於gan的想法而產生的GAIN。        GAIN的架構分為 * ### Generator * Generator 希望最大化 Discriminator 的誤差 * input:數據集 **D**、遮罩 **M**、隨機偏移 **Z** * output: 將D填補過後的數據向量 **x'** * ### Discriminator * Discriminator 希望最大化 Generator的誤差 * input: 來自Generator的 **x'** * output: 判斷 **x'** 是否來自 Generator        以下**G**: Generator、**D**: Generator、**M**: Mask        訓練的目標是，透過最大化正確預測M遮罩來訓練D，也就是D要能準確地判斷x ̂中所有元素是否為G產生的。然後G則是要希望最小化正確預測M來訓練，也就是希望M中0(補值的地方)可以騙過D，且M中1的地方要能夠接近真實的數值。 ![](https://i.imgur.com/tDtWLwE.png) ## 實驗        最後就是GAIN的實驗與其他演算法的比較。請參閱section. 6 ## 結語        這篇文章比較難理解與解釋，此篇拿到是去年在ICML所發表的，是一篇結合了GAN的感覺但跟GAN其實不太一樣的訓練方式。此篇的突破貢獻我想是來自將GAN的概念導入了missing feature填補的領域當中，且證實了gan在這個領域中是可以work的。        最後附上此篇論文的程式碼，若想理解直接看程式碼我覺得比較有幫助。 * TF版(原作者): https://github.com/jsyoon0823/GAIN * Pytorch版: https://github.com/lethaiq/GAIN