# Customer Segmentation 4 bis (Deep Learning): 09/11/2023
* number of cluster: **4**
* segmentation Year: **2023**
* Selected Features:
* Customer_Financial_Health
* Customer_Commercial_Region
* Customer_Size
* Customer_Seniority_YearsGroup
* XSelling_NoLegalEntity
* XSelling_NoProducts
* Amount_Invoicing
* Amount_Invoicing_Per_Person
* Customer_Size_Evolution
* Customer_Invoicing_Evolution
* Customer_Invoicing_PerPerson_Evolution
* Customer_NoOfCases_Evolution
* Customer_NoOfProducts_Evolution
* Customer_WithEmployee_Evolution
* Customer_Size_Evolution%
* Customer_Invoicing_Evolution%
* Customer_Invoicing_PerPerson_Evolution%
* Customer_NoOfCases_Evolution%
* Customer_NoOfProducts_Evolution%
* Customer_MainPartner_Type_Segment
* partner_nb
* cases_nb
* cases_complaint_ratio
* case_mean_working_hours
* updated_customer_segment
* Customer_Sector_category
* Digital_Sessions
* contract_direct_ratio
## Clustering-specific metrics:
### Silhouette Score
* **Definition**:
This metric calculates the mean silhouette coefficient of all samples. Each sample's silhouette coefficient is computed as the difference between its average distance to the members of the same cluster (cohesion) and its average distance to the members of the nearest cluster to which it doesn't belong (separation). The silhouette coefficient for a sample ranges from -1 to 1.
* **Metric meaning**:
Silhouette Score considers both how close points in the same cluster are to each other and how separated a cluster is from its nearest neighboring cluster.
* **Interpretation**:
- A score close to 1 implies the sample is well clustered.
- A score close to 0 implies the sample is on or very close to the decision boundary between two neighboring clusters.
- A score close to -1 implies the sample is incorrectly clustered.
* **Result**:
* silhouette score: **0.29**
### Davies-Bouldin Score
- **Definition**:
This metric evaluates the average similarity ratio of each cluster with its most similar cluster, where similarity is a ratio of within-cluster distances to between-cluster distances. Hence, the closer to 0 the DB index, the better.
- **Interpretation**:
- Lower values indicate better clustering.
- A lower Davies-Bouldin score relates to a model with better separation between the clusters.
- **Metric meaning**:
Davies-Bouldin Score Score evaluates the ratio of between-cluster to within-cluster distances.
- **Results**:
- Davies-Bouldin: **1.16**
### Calinski-Harabasz Score (Variation Ratio Criterion)
- **Definition**:
This metric evaluates the average similarity ratio of each cluster with its most similar cluster, where similarity is a ratio of within-cluster distances to between-cluster distances. Hence, the closer to 0 the DB index, the better.
- **Interpretation**:
- Lower values indicate better clustering.
- A lower Davies-Bouldin score relates to a model with better separation between the clusters.
- **Metric meaning**:
Calinski-Harabasz Score Score evaluates the ratio of between-cluster to within-cluster distances, but differently than Davies-Bouldin.
- **Results**:
- Calinski-Harabasz score: **2999.7**
## Clusters repartitions
| Cluster | Size (customers nb.) | Invoicing |
| ------- | ----------------------:|:--------------------:|
| 0 | 95.761 (36.2%) | 174.671.364 (67.8%) |
| 1 | 62.415 (23.6%) | 55.935.599 (21.7%) |
| 2 | 40.092 (15.2%) | 10.818.324 (4.2%) |
| 3 | 66.314 (25.1%) | 16.031.927 (6.2%) |
## Clusters representative SHAP analysis
### Cluster 0:
* SHAP graph for features importances:

* SHAP most important features heatmaps:
* Customer_MainPartner_Type_Segment:

* Customer_Seniority_YearsGroup:

* Customer_Commercial_Region:

* SHAPStory analysis using ChatGPT:
The AI model predicted that the customer is part of the cluster with a high probability of 99.70%. The most influential positive SHAP values were Customer_Seniority_YearsGroup (2.683895), Customer_MainPartner_Type_Segment (1.888227), and updated_customer_segment (1.701828). This suggests that the customer's long-standing relationship with the company, the type of their main partner, and their updated customer type significantly contributed to the prediction.
On the other hand, the most influential negative SHAP values were Customer_Sector_category (-0.44299), Customer_Size (-0.206476), and contract_direct_ratio (-0.099047). This implies that the customer's sector activity, their company size, and the ratio of direct to indirect contracts negatively influenced the prediction.
The interaction between these features could be that the customer, despite being in a sector or having a company size that is typically not part of the cluster, has maintained a long-term relationship with the company and has a significant partner type and customer type, which outweighed the negative influences.
*In conclusion, the classification occurred due to the customer's long-standing relationship, significant partner type, and updated customer type, which were strong enough to counteract the negative influences from their sector, company size, and contract ratio.*
### Cluster 1:
* SHAP graph for features importances:

* SHAP most important features heatmaps:
* updated_customer_segment:

* Customer_MainPartner_Type_Segment:

* Customer_Financial_Health:

* SHAPStory analysis using ChatGPT:
Based on the SHAP values, the AI model predicted a high probability that the customer is part of the cluster due to several key factors. The customer's seniority, indicated by the 'Customer_Seniority_YearsGroup' feature, had the highest positive SHAP value, suggesting that newer customers are more likely to be part of the cluster. The 'Customer_MainPartner_Type_Segment' feature also had a high positive SHAP value, indicating that the type of the customer's main partner significantly influenced the prediction.
The customer's financial health and commercial region, represented by 'Customer_Financial_Health' and 'Customer_Commercial_Region' respectively, also contributed positively to the prediction. The customer's sector category, represented by 'Customer_Sector_category', and the updated customer segment, represented by 'updated_customer_segment', also had high positive SHAP values, suggesting that these aspects of the customer's profile were influential in the prediction.
On the other hand, the 'XSelling_NoProducts' feature had a negative SHAP value, implying that customers with fewer cross-selling products are less likely to be part of the cluster.
*In summary, the model's prediction was largely influenced by the customer's seniority, main partner type, financial health, commercial region, sector category, and updated customer segment. The number of cross-selling products was a negative influence.*
### Cluster 2:
* SHAP graph for features importances:

* SHAP most important features heatmaps:
* Customer_MainPartner_Type_Segment:

* Customer_Seniority_YearsGroup:

* updated_customer_segment:

* SHAPStory analysis using ChatGPT:
The AI model predicted with a 100% probability that the customer is part of the cluster. The most influential positive SHAP values were associated with the features 'Customer_MainPartner_Type_Segment', 'Customer_Seniority_YearsGroup', 'Customer_Commercial_Region', 'updated_customer_segment', and 'Customer_Sector_category'. This suggests that the customer's main partner type, their seniority, their commercial region, their updated customer segment, and their sector category were significant contributors to the model's prediction.
On the other hand, the most influential negative SHAP values were associated with the features 'Customer_Size_Evolution', 'Customer_Invoicing_PerPerson_Evolution', and 'Customer_Invoicing_Evolution%'. This indicates that the evolution of the customer's size, the evolution of invoicing per person, and the percentage evolution of total invoicing were factors that decreased the likelihood of the customer being part of the cluster.
*The interaction between these features could suggest that the customer, despite changes in size and invoicing, has maintained a consistent relationship with their main partner, has a long-standing seniority, and operates in a specific commercial region and sector. This combination of factors led the model to predict that the customer is part of the cluster.*
### Cluster 3:
* SHAP graph for features importances:

* SHAP most important features heatmaps:
* updated_customer_segment:

* Customer_Financial_Health:

* Customer_MainPartner_Type_Segment:

* SHAPStory analysis using ChatGPT:
## Business Features Analysis of all clusters
* Customer Segment:

* Activity Sectors:

*