# NSGAII-SCC [TOC] ## Example 1: Education performance * Purpose of the Example:  > An education institute wants to investigate students' learning performance and study which factors highly affect students' learning | Dataset| Symbol | Description | Example | | -------| -------| ----------- | -------------------------------------------------------| | D | P | Dataset with p dimension (the whole features)| | | Q | q | As a dataset contains the "target interest" attritutes in D |Student performance <br> • GPA | | X | x |The data domain of factors |Student informations <br> • average hours of studying <br> • the aptitude test result <br> • gender <br> • age <br> • demographic meaning | --- | Method | Symbol | Description | | -------------- |:------:| :--------------------------------------------:| | Clustering | Ω | L= Ω(Q) | | Classification | Φ | L'= Φ(X under L) | ## Example 2: Experiment setting * Purpose of the Example:  1. Identify the labels of the "target interest" in dataset Q to describe the hidden pattern 2. Searching the attribute of X, which are highly correlated to the identified label 3. Maintaining the clustering and classification quality by the mult-objective optomization method ### Flow chart ![](https://i.imgur.com/qAHDExd.png) ![](https://i.imgur.com/XbubHlt.png) ### Dataset explanation - <font color=red>Rossmann</font> and <font color=red>Wal-Mart</font> sales data were used in the experiment | Dataset | Original Discrption | Preprocessing | |:----------------------:| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------- | | Rossmann</br> (German) | - Daily sales of 1115 stores with 27 features from Jan, 2013 to Jul, 2015</br> - 19 features aggregated to show monthly sales for clustering (Q)</br> - X dataset contains the variables might be corrlative to sales performance, such as promotions, store type, competition distance, school holiday, state holidays, sales seasonlitiy, location, and assortment level | Aggregate into Monthly | | Walmart</br> (US) | - Weekly sales of 45 stores with 29 features from Feb, 2010 to Oct, 2012 </br> - 22 feaures represented by the monthly sales feature (Q)</br> - 6 remaining attributes which includes store type , average temperature of the region, average fuel price for each month, consumer price index of the region, the unemplyment rate of the region and the number of the state holidays the represents the characteristics related to the region where the store located. | Aggregate into Monthly | ![](https://i.imgur.com/GrUwrnT.png) - The X dataset was used for classification after the lables of thje stores were generated using the clustering process with the Q dataset. - Both dataset has contain information that might correlate with the sales performance of each store including <font color=blue>promotions in the time period, <font color=blue>store type</font>, <font color=blue>state holidays</font>, <font color=blue>sale seasonality</font>, <font color=blue>locality</font>, <font color=blue>consumer price index</font>, and <font color=blue>fuel price</font>. | Clustering | Kmeans, Aggomerative hierarchical clustering| | -------- | -------- | | Classification| ANN, KNN, SVM, Decision tree| What's the difference between NSGA and NSGAII > NSGAII used elitism function and crowding-distance assignment