[EUSFLAT 2023](https://www.eusflat2023.eu/)
# [A Bayesian Interpretation of Fuzzy C-Means](https://link.springer.com/chapter/10.1007/978-3-031-39965-7_37)
#### _[Corrado Mencar](http://www.di.uniba.it/~mencar)_, [Ciro Castiello](https://www.uniba.it/it/docenti/castiello-ciro)
[Dept. of Computer Science](http://www.di.uniba.it)
[University of Bari Aldo Moro](https://www.uniba.it), [Italy](https://www.italia.it/en)
---
## Motivations
- XAI ↠ provide *explanations* to decisions made by an AI
- An explanation attempts to trace **objective** relations of dependence from *explanans* to *explanandum*
- See [Counterfactual Theory of Explanation](https://doi.org/10.1086/687859)
- e.g., explanandum is a fact and explanans is a theoretical model under which the fact is true
---
### Fuzzy Logic and XAI
- FL enables the representation and processing of imprecise and gradual information.
- key aspects of human-centric information processing;
- However, the mere adoption of fuzzy logic does not imply the guarantee of providing human-oriented explanations.
---
### (Fuzzy) Clustering and XAI
- Hard clustering: each data point belongs to one cluster only.
- Possibly, a probability distribution of clusters conditioned to observed data points is returned.
- Fuzzy clustering: partial membership assigned to data points
- degree of similarity of a data point with respect to a cluster prototype.
> In XAI the interpretation of membership degrees is instrumental to provide meaningful explanations.
---
### Fuzzy C-Means (FCM)
- Efficient, robust, few hyper-parameters, largely extended
- The sum of membership degrees of any data point to all clusters is equal to one.
- Clusters do not have a convex shape ⇝ membership cannot be interpreted as similarity
- Membership degrees have been interpreted as "degrees of sharing"
- requires an interpretation in a formal context to claim a clear meaning.
---

An example of two 1-d clusters centered on $v_1=0.3$ and $v_2=0.5$
---
## Bayesian interpretation
### Assumption 1
Data are generated by a Mixture Model.
$f_X(x|\theta) = \sum_{i=1}^{c}{\Pr(C=i)f_{X|C}(x|i,\theta)}$
---
### Assumption 2
All clusters are equi-probable.
$\Pr(C=i)=\frac{1}{c}$
---
### Polar representation of data

---
### Assumption 3
Data generation depends on the distance from a prototype.
$f_{X|C}(x|i,\theta) = f_{A|D,C}(\alpha|d,i,\theta)\cdot f_{D|C}(d|i,\theta)$
$=\frac{1}{2\pi^{n-1}}\cdot f_{D|C}(d|i,\theta)$
---
### Assumption 4
Distance generation probability has the form:
$$f_{D|C}(d|i,\theta) = \frac {1}{K} \cdot d^{-\beta}$$
where $\beta>0$ and:
$$ K = \begin{cases} \log\left(l\right) - \log\left(a\right) & \mathrm{if}\ \beta=1\\ \\ \displaystyle{\frac{a^{1-\beta} - l^{1-\beta}}{\beta -1}} & \mathrm{if}\ \beta\neq 1 \end{cases} $$
---
#### Examples of distribution

---
#### What is the meaning of $\beta$?
$\beta$ regulates the expected distance of observed data.

---
### Posterior Probability
$$
\begin{aligned}
\Pr\left(C=i|X=x,\theta\right)
&= \frac{f_{X|C}\left(x|i,\theta\right)\cdot\Pr\left(C=i\right)}{f_{X}\left(x|\theta\right)}\\
&=\frac{f_{A,D|C}\left(\alpha,d|i,\theta\right)\cdot\Pr\left(C=i\right)}{\sum_k{f_{A,D|C}\left(\alpha,d|k,\theta\right)}f_C(k)}\\
&=\frac{\frac{1}{2\pi^{n-1}} \cdot \frac {1}{K} \cdot d_i^{-\beta}\cdot\frac{1}{c}}{\sum_{k}\left(\frac{1}{2\pi^{n-1}} \cdot \frac {1}{K} \cdot d_k^{-\beta}\cdot\frac{1}{c}\right)}\\
&=\frac{d_i^{-\beta}}{\sum_{k}d_k^{-\beta}}\\
\end{aligned}
$$
---
### Adjusting things
Let $m=1+\frac{2}{\beta}$ (⇒ $m>1$). Then:
\begin{equation}
\Pr\left(C=i|X=x,\beta\right) = \frac{1}{\sum_{k=1}^{c}\left(\frac{d_{i}}{d_{k}}\right)^{\frac{2}{m-1}}}
\end{equation}
which coincides with the formula defining the degree of membership of a data sample to a cluster in FCM.
---
## So what?
1. What is the "degree of sharing"? *It is the posterior probability that a data sample belongs to a cluster, given the aforementioned assumptions.*
2. What does $m$ mean? *It is related to the expected distance of a data sample to a cluster.*
---
## A parallel with soft k-means (SKM)

1. Look at what happens in correspondence of the prototypes;
2. Look at what happens at the extremes of the domain.
---
## Conclusion
- We are not contesting FCM; rather we are establishing an explanation of some concepts like "degree of sharing" and "fuzzification parameter" in the realm of a sound mathematical theory.
- The Bayesian interpretation opens the door to new extensions
- Bayesian modeling of the parameters;
- Non-equiprobable clusters;
- Different expected values for each clusters;
- etc.
---
# Thank you
{"slideOptions":"{\"theme\":\"white\",\"width\":1600,\"height\":900}","title":"A Bayesian Interpretation of Fuzzy C-Means","contributors":"[{\"id\":\"e261fd53-ba89-4b17-adb0-d93140daaa8a\",\"add\":6710,\"del\":1629}]"}