# Customer Segmentation 6 (5 clusters): 08/11/2023 * number of cluster: **5** * segmentation Year: **2023** * Selected Features: * Customer_Financial_Health * Customer_Commercial_Region * Customer_Language * Customer_Size * Customer_Seniority_YearsGroup * Customer_DirectIndirect * Customer_MainPartner_Type_Segment * contract_distinct_product_group * updated_customer_segment * Customer_Sector_category * XSelling_NoLegalEntity_cat * XSelling_NoProducts_cat * Customer_Size_Evolution%_cat * Customer_NoOfProducts_Evolution_cat * cluster_id ## Clustering-specific metrics: ### Silhouette Score * **Definition**: This metric calculates the mean silhouette coefficient of all samples. Each sample's silhouette coefficient is computed as the difference between its average distance to the members of the same cluster (cohesion) and its average distance to the members of the nearest cluster to which it doesn't belong (separation). The silhouette coefficient for a sample ranges from -1 to 1. * **Metric meaning**: Silhouette Score considers both how close points in the same cluster are to each other and how separated a cluster is from its nearest neighboring cluster. * **Interpretation**: - A score close to 1 implies the sample is well clustered. - A score close to 0 implies the sample is on or very close to the decision boundary between two neighboring clusters. - A score close to -1 implies the sample is incorrectly clustered. * **Result**: * silhouette score: **0.30** ### Davies-Bouldin Score - **Definition**: This metric evaluates the average similarity ratio of each cluster with its most similar cluster, where similarity is a ratio of within-cluster distances to between-cluster distances. Hence, the closer to 0 the DB index, the better. - **Interpretation**: - Lower values indicate better clustering. - A lower Davies-Bouldin score relates to a model with better separation between the clusters. - **Metric meaning**: Davies-Bouldin Score Score evaluates the ratio of between-cluster to within-cluster distances. - **Results**: - Davies-Bouldin: **7.30** ### Calinski-Harabasz Score (Variation Ratio Criterion) - **Definition**: This metric evaluates the average similarity ratio of each cluster with its most similar cluster, where similarity is a ratio of within-cluster distances to between-cluster distances. Hence, the closer to 0 the DB index, the better. - **Interpretation**: - Lower values indicate better clustering. - A lower Davies-Bouldin score relates to a model with better separation between the clusters. - **Metric meaning**: Calinski-Harabasz Score Score evaluates the ratio of between-cluster to within-cluster distances, but differently than Davies-Bouldin. - **Results**: - Calinski-Harabasz score: **618.0** ## Clusters repartitions | Cluster | Size (customers nb.) | Invoicing | | ------- | -----------------------:|:-------------------:| | 0 | 158.815 (59.7%) | 57.812.312 (22.5%) | | 1 | 41.889 (15.7%) | 80.682.897 (31.3%) | | 2 | 35039 (13.2%) | 85.343.475 (33.2%) | | 3 | 20.425 (7.7%) | 25.040.103 (9.7%) | | 4 | 9.890 (3.7%) | 8.565.681 (3.3%) | ## Clusters representative SHAP analysis ### Cluster 0: * SHAP graph for features importances: ![cluster_0.png](https://hackmd.io/_uploads/rkmlzR_XT.png) * SHAP most important features heatmaps: * updated_customer_segment: ![heatmap_segment.png](https://hackmd.io/_uploads/r1LnI7FXp.png) * Customer_Financial_Health: ![heatmap_financial_health.png](https://hackmd.io/_uploads/HkJJwmYQp.png) * Customer_Sector_category: ![heatmap_sector.png](https://hackmd.io/_uploads/BkU-w7KXp.png) * SHAPStory analysis using ChatGPT: The AI model predicted with 100% certainty that the customer is part of the cluster. The most influential positive SHAP values were the 'Customer_Commercial_Region', 'Customer_Seniority_YearsGroup', and 'updated_customer_segment'. This suggests that the customer's commercial region of 'Liège - Verviers - Eupen - Namur', their seniority of 10-15 years, and their classification as a 'Principal_Entrepreneur' significantly contributed to their classification as part of the cluster. On the other hand, the 'Customer_Financial_Health' had a high negative SHAP value, indicating that the customer's unknown financial health negatively influenced their classification. The model also took into account the customer's indirect relationship with the company, their size, and the number of different legal entities and products for cross-selling contracts. *In summary, the customer's commercial region, seniority, and entrepreneur status were the most influential factors in their classification as part of the cluster, despite their unknown financial health.* ### Cluster 1: * SHAP graph for features importances: ![cluster_1.png](https://hackmd.io/_uploads/SyOgfA_76.png) * SHAP most important features heatmaps: * Customer_DirectIndirect: ![heatmap_direct_indirect.png](https://hackmd.io/_uploads/SkBwwXFXp.png) * Customer_Language: ![heatmap_language.png](https://hackmd.io/_uploads/r1QFP7tQa.png) * Customer_Financial_Health: ![heatmap_financial_health.png](https://hackmd.io/_uploads/HkJJwmYQp.png) * SHAPStory analysis using ChatGPT: The AI model predicted that the customer is part of a cluster with a high probability of 93.34%. The most influential positive SHAP values were the Customer_Language, contract_distinct_product_group, and Customer_Seniority_YearsGroup. This suggests that the customer's language being Dutch, the distinct product group of 625, and the customer's seniority of 20+ years significantly contributed to the prediction. On the other hand, the most influential negative SHAP value was the Customer_DirectIndirect feature, indicating that the customer's indirect type negatively influenced the prediction. This could mean that customers with direct contracts are more likely to be part of the cluster. The interaction between these features could be that long-standing customers who speak Dutch and have a diverse range of products are more likely to be part of the cluster, especially if they have direct contracts. *In summary, the classification may have occurred due to the customer's language, product diversity, seniority, and type of contract.* ### Cluster 2: * SHAP graph for features importances: ![cluster_2.png](https://hackmd.io/_uploads/r1yzMR_ma.png) * SHAP most important features heatmaps: * Customer_DirectIndirect: ![heatmap_direct_indirect.png](https://hackmd.io/_uploads/SkBwwXFXp.png) * Customer_Financial_Health: ![heatmap_financial_health.png](https://hackmd.io/_uploads/HkJJwmYQp.png) * XSelling_NoProducts_cat: ![heatmap_xselling.png](https://hackmd.io/_uploads/BJhAP7KQT.png) * SHAPStory analysis using ChatGPT: The AI model predicted with 100% certainty that the customer is part of a cluster. The most influential positive SHAP values were for the features 'Customer_MainPartner_Type_Segment', 'contract_distinct_product_group', and 'Customer_Sector_category'. This suggests that the customer's main partner being an 'Accountant 4', the customer having a diverse range of contracts (625,120 distinct product groups), and the customer's sector being 'Other' significantly contributed to the prediction. On the other hand, the features 'Customer_Size_Evolution%_cat' and 'Customer_NoOfProducts_Evolution_cat' had the least influence on the prediction, indicating that the customer's company size and the number of products did not change significantly. The model also considered the customer's financial health, commercial region, and language, which were 'Green', 'Hainaut', and 'French' respectively. These features also had relatively high SHAP values, indicating their importance in the prediction. In summary, the model's prediction was primarily influenced by the customer's main partner type, the diversity of their contracts, and their sector. The stability of the customer's company size and number of products also played a role in the prediction. ### Cluster 3: * SHAP graph for features importances: ![cluster_3.png](https://hackmd.io/_uploads/SyEzMAu7a.png) * SHAP most important features heatmaps: * Customer_DirectIndirect: ![heatmap_direct_indirect.png](https://hackmd.io/_uploads/SkBwwXFXp.png) * Customer_Language: ![heatmap_language.png](https://hackmd.io/_uploads/r1QFP7tQa.png) * Customer_MainPartner_Type_Segment: ![heatmap_main_partner_type.png](https://hackmd.io/_uploads/rkve_QFXT.png) * SHAPStory analysis using ChatGPT: The AI model predicted with 82.72% probability that this customer is part of a certain cluster. The most influential positive SHAP values were for the features 'Customer_Seniority_YearsGroup', 'Customer_Commercial_Region', and 'Customer_Language'. This suggests that the customer's long-standing relationship with the company (5-10 years), their location in the Hainaut region, and their preference for French language were significant contributors to the model's prediction. On the other hand, the most influential negative SHAP values were for 'Customer_Size' and 'contract_distinct_product_group'. This indicates that the small size of the customer's company and the large number of distinct product groups in their contract negatively influenced the model's prediction. There may be an interaction between 'Customer_Seniority_YearsGroup' and 'contract_distinct_product_group', where long-standing customers tend to have contracts with a wider range of product groups. *In summary, the customer's long-standing relationship, location, and language preference, combined with their small company size and diverse product contract, led the model to predict that they belong to this particular cluster. * ### Cluster 4: * SHAP graph for features importances: ![cluster_4.png](https://hackmd.io/_uploads/rJtzGCd7T.png) * SHAP most important features heatmaps: * Customer_DirectIndirect: ![heatmap_direct_indirect.png](https://hackmd.io/_uploads/SkBwwXFXp.png) * contract_distinct_product_group: ![heatmap_product_group.png](https://hackmd.io/_uploads/Hy64dXFXT.png) * Customer_MainPartner_Type_Segment: ![heatmap_main_partner_type.png](https://hackmd.io/_uploads/rkve_QFXT.png) * SHAPStory analysis using ChatGPT: ## Business Features Analysis of all clusters * Customer Segment: ![heatmap_segment.png](https://hackmd.io/_uploads/r1LnI7FXp.png) * Activity Sector: ![heatmap_sector.png](https://hackmd.io/_uploads/BkU-w7KXp.png) * XSelling number of product: ![heatmap_xselling.png](https://hackmd.io/_uploads/BJhAP7KQT.png)