---
title: 'Multi-Class GAN'
disqus: hackmd
---
###### tags:`IIPP 2024`
Multi-Class GAN
===
## Table of Contents
[TOC]
## Project Overview
After initial experiments, the data is not in time series format, TimeGAN can't handle the problem. Based on the experiment, the problem of data is not regression, the reliable solution is multi-class classification problem
The objective of this project is to address the imbalanced Multi-Class problem using GAN.
Synthetic data by GAN will help algorithm learn and result in better performance than classical algorithms without oversampling.
Project Timeline
---
```mermaid
gantt
title Project Plan September - October
section Datasets
EDA :a1, 2024-09-03, 22d
Multi-Class CLF :after a1, 20d
section GAN
GAN experiment :2024-09-25, 20d
```
```mermaid
gantt
title Project Plan October - November
section GAN
Result Documentation :a1, 2024-10-03, 22d
section Publication
Writing Paper :2024-10-29, 30d
```
> Github Repo:
## Exploratory Data Analysis
Data Source: https://www.kaggle.com/datasets/sudhanshu2198/processed-data-credit-score

>the graph shown that credit score class is imbalanced

> correlation matrix between the target class Credit_Score and the features (most of features are correlated)
## Framework
## CLF Result before GAN
> CLF for dataset 1 -> predict the customer segment using DT, KNN, LR

>classification report before GAN on DT
>0 = good
>1 = poor
>2 = standar (majority)
## Framework
>Research Framework
>
## multi-class CLF Result after GAN
> Based on recent experiment TimeGAN was not applicable because the data format (not a time series format), try to use tabular GAN Models
> EDA -> Tabular GAN -> Multi-Class CLF
> GAN's model : WGAN-GP | CTGAN | CopulaGAN

> after CTGAN, DT result was improved, from avg 0.74 to 0.82
> the minorty class (0 and 1) were siginificantly improved
## Summary of experiment results
>f1 - Score Result
| Alg/class | NONE | CTGAN | CopulaGAN | DraGAN | WGAN-GP |
| --------- | ----- | ----- | --------- | --- | ------- |
| DT/ | | | | | |
| 0 | 0.681 | 0.875 | 0.785 |0.882 |0.928 |
| 1 | 0.732 | 0.827 | 0.754 |0.810 |0.735 |
| 2 | 0.763 | 0.757 | 0.709 |0.751 |0.763 |
| KNN/ | | | | | |
| 0 | 0.646 | 0.760 | 0.760 |0.842 |0.920 |
| 1 | 0.761 | 0.744 | 0.748 |0.792 |0.764 |
| 2 | 0.768 | 0.742 | 0.729 |0.748 |0.768 |
| LR/ | | | | | |
| 0 | 0.337 | 0.714 | 0.617 |0.741 |0.843 |
| 1 | 0.538 | 0.725 | 0.671 |0.711 |0.515 |
| 2 | 0.689 | 0.461 | 0.503 |0.585 |0.640 |
| XGB/ | | | | | |
| 0 | 0.673 | 0.857 | 0.828 |0.870 |0.909 |
| 1 | 0.758 | 0.857 | 0.820 |0.831 |0.753 |
| 2 | 0.838 | 0.773 | 0.757 |0.775 |0.784 |
| RF/ | | | | | |
| 0 | 0.735 | 0.901 | 0.849 |0.904 |0.936 |
| 1 | 0.804 | 0.891 | 0.839 |0.867 |0.806 |
| 2 | 0.816 | 0.813 | 0.797 |0.813 |0.814 |
| LGB/ | | | | | |
| 0 | 0.736 | 0.905 | 0.875 |0.906 |0.938 |
| 1 | 0.776 | 0.875 | 0.852 |0.849 |0.783 |
| 2 | 0.800 | 0.798 | 0.785 |0.798 |0.802 |
>Overall Accuracy Result
| Alg | NONE | CTGAN | CopulaGAN | DraGAN| WGAN-GP |
| ----| ----- | ----- | --------- | --- | ------- |
| DT | 0.739 | 0.819 | 0.748 |0.817 |**0.838** |
| KNN | 0.743 | 0.748 | 0.745 |0.796 |**0.841** |
| LR | 0.604 | 0.642 | 0.599 |0.671 |**0.718** |
| XGB | 0.831 | 0.831 | 0.803 |0.829 |**0.843** |
| RF | 0.797 | 0.869 | 0.829 |0.863 |**0.873** |
| LGB | 0.781 | 0.859 | 0.837 |0.854 |**0.865** |
> mean rank score on F1 Score
|Algorithm| NONE| CTGAN| CopulaGAN| DraGAN| WGAN-GP|
| ----| ----- | ----- | --------- | --- | ------- |
|DT| 3.83| 2.33| 4.00| 2.67| **2.17**|
|KNN| 3.17| 4.17| 4.17| 2.00| **1.50**|
|LGB| 4.00| 2.50| 3.67| 2.83| **2.00**|
|LR |3.33 |3.00 |3.67| **2.33**| 2.67|
|RF |3.67| 2.50 |4.00| 2.50| **2.33**|
|XGB| 3.33 |2.67 |4.00| **2.33**| 2.67|
>Accuracy Graph Plot

## Publication writing
Journal Target :
-Applied Intelligence Springer, close access
-Progress in Artificial Intelligence, close access
title draft:
-The Effectivenes of GAN-based oversampling method for multi-class credit score classification.
## Main References
> B. Zhu, X. Pan, S. vanden Broucke, and J. Xiao, “A GAN-based hybrid sampling method for imbalanced customer classification,” Inf Sci (N Y), vol. 609, pp. 1397–1411, 2022, doi: https://doi.org/10.1016/j.ins.2022.07.145.
>CTGAN : https://arxiv.org/pdf/1907.00503
>WGAN-GP : https://arxiv.org/pdf/1704.00028
> CopulaGAN : https://www.sciencedirect.com/science/article/pii/S2667096823000241
>draGAN : https://www.sciencedirect.com/science/article/pii/S0957417423034541
###### tags: `GAN-based` `Mulit-Class` `Customer credit score`