[EUSFLAT 2023](https://www.eusflat2023.eu/) # [A Bayesian Interpretation of Fuzzy C-Means](https://link.springer.com/chapter/10.1007/978-3-031-39965-7_37) #### _[Corrado Mencar](http://www.di.uniba.it/~mencar)_, [Ciro Castiello](https://www.uniba.it/it/docenti/castiello-ciro) [Dept. of Computer Science](http://www.di.uniba.it) [University of Bari Aldo Moro](https://www.uniba.it), [Italy](https://www.italia.it/en) --- ## Motivations - XAI ↠ provide *explanations* to decisions made by an AI - An explanation attempts to trace **objective** relations of dependence from *explanans* to *explanandum* - See [Counterfactual Theory of Explanation](https://doi.org/10.1086/687859) - e.g., explanandum is a fact and explanans is a theoretical model under which the fact is true --- ### Fuzzy Logic and XAI - FL enables the representation and processing of imprecise and gradual information. - key aspects of human-centric information processing; - However, the mere adoption of fuzzy logic does not imply the guarantee of providing human-oriented explanations. --- ### (Fuzzy) Clustering and XAI - Hard clustering: each data point belongs to one cluster only. - Possibly, a probability distribution of clusters conditioned to observed data points is returned. - Fuzzy clustering: partial membership assigned to data points - degree of similarity of a data point with respect to a cluster prototype. > In XAI the interpretation of membership degrees is instrumental to provide meaningful explanations. --- ### Fuzzy C-Means (FCM) - Efficient, robust, few hyper-parameters, largely extended - The sum of membership degrees of any data point to all clusters is equal to one. - Clusters do not have a convex shape ⇝ membership cannot be interpreted as similarity - Membership degrees have been interpreted as "degrees of sharing" - requires an interpretation in a formal context to claim a clear meaning. --- ![](https://hackmd.io/_uploads/H1RVHQ562.png) An example of two 1-d clusters centered on $v_1=0.3$ and $v_2=0.5$ --- ## Bayesian interpretation ### Assumption 1 Data are generated by a Mixture Model. $f_X(x|\theta) = \sum_{i=1}^{c}{\Pr(C=i)f_{X|C}(x|i,\theta)}$ --- ### Assumption 2 All clusters are equi-probable. $\Pr(C=i)=\frac{1}{c}$ --- ### Polar representation of data ![](https://hackmd.io/_uploads/ryKcoXq62.png) --- ### Assumption 3 Data generation depends on the distance from a prototype. $f_{X|C}(x|i,\theta) = f_{A|D,C}(\alpha|d,i,\theta)\cdot f_{D|C}(d|i,\theta)$ $=\frac{1}{2\pi^{n-1}}\cdot f_{D|C}(d|i,\theta)$ --- ### Assumption 4 Distance generation probability has the form: $$f_{D|C}(d|i,\theta) = \frac {1}{K} \cdot d^{-\beta}$$ where $\beta>0$ and: $$ K = \begin{cases} \log\left(l\right) - \log\left(a\right) & \mathrm{if}\ \beta=1\\ \\ \displaystyle{\frac{a^{1-\beta} - l^{1-\beta}}{\beta -1}} & \mathrm{if}\ \beta\neq 1 \end{cases} $$ --- #### Examples of distribution ![](https://hackmd.io/_uploads/BJ0bbE9a3.png) --- #### What is the meaning of $\beta$? $\beta$ regulates the expected distance of observed data. ![](https://hackmd.io/_uploads/SkBYGNcan.png) --- ### Posterior Probability $$ \begin{aligned} \Pr\left(C=i|X=x,\theta\right) &= \frac{f_{X|C}\left(x|i,\theta\right)\cdot\Pr\left(C=i\right)}{f_{X}\left(x|\theta\right)}\\ &=\frac{f_{A,D|C}\left(\alpha,d|i,\theta\right)\cdot\Pr\left(C=i\right)}{\sum_k{f_{A,D|C}\left(\alpha,d|k,\theta\right)}f_C(k)}\\ &=\frac{\frac{1}{2\pi^{n-1}} \cdot \frac {1}{K} \cdot d_i^{-\beta}\cdot\frac{1}{c}}{\sum_{k}\left(\frac{1}{2\pi^{n-1}} \cdot \frac {1}{K} \cdot d_k^{-\beta}\cdot\frac{1}{c}\right)}\\ &=\frac{d_i^{-\beta}}{\sum_{k}d_k^{-\beta}}\\ \end{aligned} $$ --- ### Adjusting things Let $m=1+\frac{2}{\beta}$ (⇒ $m>1$). Then: \begin{equation} \Pr\left(C=i|X=x,\beta\right) = \frac{1}{\sum_{k=1}^{c}\left(\frac{d_{i}}{d_{k}}\right)^{\frac{2}{m-1}}} \end{equation} which coincides with the formula defining the degree of membership of a data sample to a cluster in FCM. --- ## So what? 1. What is the "degree of sharing"? *It is the posterior probability that a data sample belongs to a cluster, given the aforementioned assumptions.* 2. What does $m$ mean? *It is related to the expected distance of a data sample to a cluster.* --- ## A parallel with soft k-means (SKM) ![](https://hackmd.io/_uploads/S1v4wEq62.png) 1. Look at what happens in correspondence of the prototypes; 2. Look at what happens at the extremes of the domain. --- ## Conclusion - We are not contesting FCM; rather we are establishing an explanation of some concepts like "degree of sharing" and "fuzzification parameter" in the realm of a sound mathematical theory. - The Bayesian interpretation opens the door to new extensions - Bayesian modeling of the parameters; - Non-equiprobable clusters; - Different expected values for each clusters; - etc. --- # Thank you
{"slideOptions":"{\"theme\":\"white\",\"width\":1600,\"height\":900}","title":"A Bayesian Interpretation of Fuzzy C-Means","contributors":"[{\"id\":\"e261fd53-ba89-4b17-adb0-d93140daaa8a\",\"add\":6710,\"del\":1629}]"}
    204 views