# Customer Segmentation 5 (4 clusters): 07/11/2023
* number of cluster: **4**
* segmentation Year: **2023**
* Selected Features:
* Customer_Commercial_Region
* Customer_Size
* Customer_Seniority_YearsGroup
* Customer_Flag_IN
* Customer_DirectIndirect
* Customer_WithEmployee_Evolution
* Customer_MainPartner_Type_Segment
* case_main_origin
* updated_customer_segment
* Customer_Sector_category
* case_main_type_cat
* XSelling_NoProducts_cat
* Amount_Invoicing_cat
* contract_mean_duration_years_cat
* contract_flag_in_count_cat
* contract_flag_out_count_cat
* cases_nb_cat
* Digital_Sessions_cat
* cases_complaint_ratio_cat
* case_mean_working_hours_cat
* Amount_Invoicing_Per_Person_cat
* Customer_Size_Evolution%_cat
* Customer_NoOfCases_Evolution%_cat
* Customer_NoOfProducts_Evolution%_cat
* Customer_Invoicing_Evolution_cat
* Customer_Invoicing_Evolution%_cat
* Customer_Invoicing_PerPerson_Evolution%_cat
* Customer_NoOfCases_Evolution_cat
* Customer_NoOfProducts_Evolution_cat
* Customer_Invoicing_PerPerson_Evolution_cat
* cluster_id
## Clustering-specific metrics:
### Silhouette Score
* **Definition**:
This metric calculates the mean silhouette coefficient of all samples. Each sample's silhouette coefficient is computed as the difference between its average distance to the members of the same cluster (cohesion) and its average distance to the members of the nearest cluster to which it doesn't belong (separation). The silhouette coefficient for a sample ranges from -1 to 1.
* **Metric meaning**:
Silhouette Score considers both how close points in the same cluster are to each other and how separated a cluster is from its nearest neighboring cluster.
* **Interpretation**:
- A score close to 1 implies the sample is well clustered.
- A score close to 0 implies the sample is on or very close to the decision boundary between two neighboring clusters.
- A score close to -1 implies the sample is incorrectly clustered.
* **Result**:
* silhouette score: **0.02**
### Davies-Bouldin Score
- **Definition**:
This metric evaluates the average similarity ratio of each cluster with its most similar cluster, where similarity is a ratio of within-cluster distances to between-cluster distances. Hence, the closer to 0 the DB index, the better.
- **Interpretation**:
- Lower values indicate better clustering.
- A lower Davies-Bouldin score relates to a model with better separation between the clusters.
- **Metric meaning**:
Davies-Bouldin Score Score evaluates the ratio of between-cluster to within-cluster distances.
- **Results**:
- Davies-Bouldin: **4.10**
### Calinski-Harabasz Score (Variation Ratio Criterion)
- **Definition**:
This metric evaluates the average similarity ratio of each cluster with its most similar cluster, where similarity is a ratio of within-cluster distances to between-cluster distances. Hence, the closer to 0 the DB index, the better.
- **Interpretation**:
- Lower values indicate better clustering.
- A lower Davies-Bouldin score relates to a model with better separation between the clusters.
- **Metric meaning**:
Calinski-Harabasz Score Score evaluates the ratio of between-cluster to within-cluster distances, but differently than Davies-Bouldin.
- **Results**:
- Calinski-Harabasz score: **470.5**
## Clusters repartitions
| Cluster | Size (customers nb.) | Invoicing |
| ------- | ------------------------:|:--------------------:|
| 0 | 109.694 (41.2%) | 38.381.749 (14.9%) |
| 1 | 74.598 (28.0%) | 66.256.383 (25.7%) |
| 2 | 51.585 (19.4%) | 11.881.102 (4.6%) |
| 3 | 30.181 (11.3%) | 140.925.233 (54.7%) |
## Clusters representative SHAP analysis
### Cluster 0:
* SHAP graph for features importances:

* SHAP Most Important Features Heatmaps:
* Customer_Invoicing_PerPerson_Evolution_cat:

* Customer_Invoicing_Evolution%_cat:

* Customer_NoOfCases_Evolution%_cat:

* SHAPStory analysis using ChatGPT:
The AI model predicted with a high probability that the customer is part of the cluster, primarily driven by the customer's invoicing characteristics and case origin. The customer's invoicing per person (SHAP value: 2.453812) and overall invoicing evolution (SHAP value: 2.522134) were the most influential positive features, indicating that the customer's financial activity significantly contributed to the prediction. The customer's case origin (SHAP value: 0.615229) and updated customer segment (SHAP value: 0.576841) also played a substantial role, suggesting that the customer's interaction with the company and their customer type were important factors.
On the other hand, the customer's contract mean duration years (SHAP value: -0.765054) and customer seniority years group (SHAP value: -0.439289) negatively influenced the prediction, implying that the customer's shorter contract duration and lower seniority reduced the likelihood of being part of the cluster.
*In conclusion, the AI model's prediction was primarily influenced by the customer's financial activity and interaction with the company, while their contract duration and seniority acted as counteracting factors.*
### Cluster 1:
* SHAP graph for features importances:

* SHAP Most Important Features Heatmaps:
* Customer_NoOfCases_Evolution_cat:

* Customer_Invoicing_Evolution%_cat:

* Amount_Invoicing_Per_Person_cat:

* SHAPStory analysis using ChatGPT:
Based on the SHAP values, the AI model's prediction that the customer belongs to a certain cluster seems to be primarily influenced by the customer's invoicing per person evolution, invoicing evolution, and commercial region. The customer's invoicing per person evolution had the highest positive SHAP value (3.63), indicating that this feature significantly contributed to the model's prediction. This suggests that the customer's increased invoicing per person over time is a strong indicator of their membership in this cluster.
Similarly, the customer's invoicing evolution also had a high positive SHAP value (3.18), suggesting that changes in the customer's total invoicing also played a crucial role in the prediction. The customer's commercial region (Bruxelles - Brabant) also had a high positive SHAP value (3.42), indicating that customers from this region are likely to be part of this cluster.
On the other hand, the customer's number of cases evolution had a negative SHAP value (-1.57), suggesting that this feature negatively influenced the prediction. This implies that the decrease in the number of cases from the customer over time made them less likely to be part of the cluster.
*In conclusion, the AI model's prediction was primarily driven by the customer's invoicing per person evolution, invoicing evolution, and commercial region, with the number of cases evolution acting as a counteracting factor.*
### Cluster 2:
* SHAP graph for features importances:

* SHAP Most Important Features Heatmaps:
* Customer_NoOfCases_Evolution_cat:

* Customer_Invoicing_PerPerson_Evolution_cat:

* Customer_DirectIndirect:

* SHAPStory analysis using ChatGPT:
The AI model predicted with 100% certainty that the customer is part of the cluster. The most influential positive SHAP values were for the features 'Customer_Invoicing_PerPerson_Evolution_cat', 'Customer_DirectIndirect', and 'Customer_Invoicing_Evolution%_cat'. This suggests that the customer's invoicing per person evolution, whether they are a direct or indirect customer, and the percentage evolution of their total invoicing played significant roles in the model's prediction.
On the other hand, the most influential negative SHAP values were for 'Customer_Flag_IN' and 'contract_mean_duration_years_cat', indicating that the customer being a new company customer and the mean duration of their contracts negatively influenced the prediction.
Interactions between these features could also have contributed to the prediction. For instance, a customer with a high invoicing per person evolution and a high percentage evolution of total invoicing, who is also a direct customer, might be more likely to be part of the cluster.
*In summary, the customer's invoicing evolution, direct or indirect status, and contract duration, along with their status as a new or existing customer, were the key factors in the model's prediction that the customer is part of the cluster.*
### Cluster 3:
* SHAP graph for features importances:

* SHAP Most Important Features Heatmaps:
* case_main_origin:

* Customer_NoOfCases_Evolution_cat:

* Customer_DirectIndirect:

* SHAPStory analysis using ChatGPT:
The AI model predicted with a high probability that the customer is part of a cluster, primarily based on the number of cases opened by the customer, the total invoicing of the customer, and the evolution of the number of cases from the customer. The customer had opened a significant number of cases (SHAP value: 5.941499), which is the most influential positive contributor to the prediction. The total invoicing of the customer (SHAP value: 1.51728) and the evolution of the number of cases from the customer (SHAP value: 1.505575) also contributed positively to the prediction.
On the other hand, the main category of customer's cases (SHAP value: -1.11408) and the customer direct type (SHAP value: -1.02424) negatively influenced the prediction. The customer's cases mainly fell into the category of 'Payroll & Salary Management', and the customer was classified as 'Indirect' in terms of direct type.
The model might have inferred from these features that the customer, despite having an indirect relationship and being involved in payroll and salary management, is highly engaged with the company, as indicated by the high number of cases and significant invoicing. This engagement, along with the increasing trend in the number of cases, led the model to classify the customer as part of the cluster.
*In summary, the classification was primarily driven by the customer's high engagement with the company, as indicated by the number of cases and invoicing, despite some negative influences from the customer's case category and direct type.*
## Business Features Analysis of all clusters
* Customer Segment:

* Total invoicing:

* Activity Sectors:

* Customer Size:

* XSelling number of Products:
