--- title: 'Multi-Class GAN' disqus: hackmd --- ###### tags:`IIPP 2024` Multi-Class GAN === ## Table of Contents [TOC] ## Project Overview After initial experiments, the data is not in time series format, TimeGAN can't handle the problem. Based on the experiment, the problem of data is not regression, the reliable solution is multi-class classification problem The objective of this project is to address the imbalanced Multi-Class problem using GAN. Synthetic data by GAN will help algorithm learn and result in better performance than classical algorithms without oversampling. Project Timeline --- ```mermaid gantt title Project Plan September - October section Datasets EDA :a1, 2024-09-03, 22d Multi-Class CLF :after a1, 20d section GAN GAN experiment :2024-09-25, 20d ``` ```mermaid gantt title Project Plan October - November section GAN Result Documentation :a1, 2024-10-03, 22d section Publication Writing Paper :2024-10-29, 30d ``` > Github Repo: ## Exploratory Data Analysis Data Source: https://www.kaggle.com/datasets/sudhanshu2198/processed-data-credit-score ![data proportion](https://hackmd.io/_uploads/Skrs67ckyl.png) >the graph shown that credit score class is imbalanced ![correlation matrix](https://hackmd.io/_uploads/SyuyAmckkg.png) > correlation matrix between the target class Credit_Score and the features (most of features are correlated) ## Framework ## CLF Result before GAN > CLF for dataset 1 -> predict the customer segment using DT, KNN, LR ![DT credit score 1](https://hackmd.io/_uploads/SJHFAQckJl.png) >classification report before GAN on DT >0 = good >1 = poor >2 = standar (majority) ## Framework >Research Framework > ## multi-class CLF Result after GAN > Based on recent experiment TimeGAN was not applicable because the data format (not a time series format), try to use tabular GAN Models > EDA -> Tabular GAN -> Multi-Class CLF > GAN's model : WGAN-GP | CTGAN | CopulaGAN ![DT CTGAN credit score 1](https://hackmd.io/_uploads/HJosC79J1x.png) > after CTGAN, DT result was improved, from avg 0.74 to 0.82 > the minorty class (0 and 1) were siginificantly improved ## Summary of experiment results >f1 - Score Result | Alg/class | NONE | CTGAN | CopulaGAN | DraGAN | WGAN-GP | | --------- | ----- | ----- | --------- | --- | ------- | | DT/ | | | | | | | 0 | 0.681 | 0.875 | 0.785 |0.882 |0.928 | | 1 | 0.732 | 0.827 | 0.754 |0.810 |0.735 | | 2 | 0.763 | 0.757 | 0.709 |0.751 |0.763 | | KNN/ | | | | | | | 0 | 0.646 | 0.760 | 0.760 |0.842 |0.920 | | 1 | 0.761 | 0.744 | 0.748 |0.792 |0.764 | | 2 | 0.768 | 0.742 | 0.729 |0.748 |0.768 | | LR/ | | | | | | | 0 | 0.337 | 0.714 | 0.617 |0.741 |0.843 | | 1 | 0.538 | 0.725 | 0.671 |0.711 |0.515 | | 2 | 0.689 | 0.461 | 0.503 |0.585 |0.640 | | XGB/ | | | | | | | 0 | 0.673 | 0.857 | 0.828 |0.870 |0.909 | | 1 | 0.758 | 0.857 | 0.820 |0.831 |0.753 | | 2 | 0.838 | 0.773 | 0.757 |0.775 |0.784 | | RF/ | | | | | | | 0 | 0.735 | 0.901 | 0.849 |0.904 |0.936 | | 1 | 0.804 | 0.891 | 0.839 |0.867 |0.806 | | 2 | 0.816 | 0.813 | 0.797 |0.813 |0.814 | | LGB/ | | | | | | | 0 | 0.736 | 0.905 | 0.875 |0.906 |0.938 | | 1 | 0.776 | 0.875 | 0.852 |0.849 |0.783 | | 2 | 0.800 | 0.798 | 0.785 |0.798 |0.802 | >Overall Accuracy Result | Alg | NONE | CTGAN | CopulaGAN | DraGAN| WGAN-GP | | ----| ----- | ----- | --------- | --- | ------- | | DT | 0.739 | 0.819 | 0.748 |0.817 |**0.838** | | KNN | 0.743 | 0.748 | 0.745 |0.796 |**0.841** | | LR | 0.604 | 0.642 | 0.599 |0.671 |**0.718** | | XGB | 0.831 | 0.831 | 0.803 |0.829 |**0.843** | | RF | 0.797 | 0.869 | 0.829 |0.863 |**0.873** | | LGB | 0.781 | 0.859 | 0.837 |0.854 |**0.865** | > mean rank score on F1 Score |Algorithm| NONE| CTGAN| CopulaGAN| DraGAN| WGAN-GP| | ----| ----- | ----- | --------- | --- | ------- | |DT| 3.83| 2.33| 4.00| 2.67| **2.17**| |KNN| 3.17| 4.17| 4.17| 2.00| **1.50**| |LGB| 4.00| 2.50| 3.67| 2.83| **2.00**| |LR |3.33 |3.00 |3.67| **2.33**| 2.67| |RF |3.67| 2.50 |4.00| 2.50| **2.33**| |XGB| 3.33 |2.67 |4.00| **2.33**| 2.67| >Accuracy Graph Plot ![performance graph](https://hackmd.io/_uploads/rJthoh3eJe.png) ## Publication writing Journal Target : -Applied Intelligence Springer, close access -Progress in Artificial Intelligence, close access title draft: -The Effectivenes of GAN-based oversampling method for multi-class credit score classification. ## Main References > B. Zhu, X. Pan, S. vanden Broucke, and J. Xiao, “A GAN-based hybrid sampling method for imbalanced customer classification,” Inf Sci (N Y), vol. 609, pp. 1397–1411, 2022, doi: https://doi.org/10.1016/j.ins.2022.07.145. >CTGAN : https://arxiv.org/pdf/1907.00503 >WGAN-GP : https://arxiv.org/pdf/1704.00028 > CopulaGAN : https://www.sciencedirect.com/science/article/pii/S2667096823000241 >draGAN : https://www.sciencedirect.com/science/article/pii/S0957417423034541 ###### tags: `GAN-based` `Mulit-Class` `Customer credit score`