# Midterm [Practical Part] * We want to approximate this data using linear regressions and Fuzzy c means ![](https://i.imgur.com/L9i6lee.png) ## Create a set of clusters from the data using FCM. * Extend your 1d implementation of clustering to Nd * After fit, for each point you will have * N_CLUSTERS probas * argmax(N_CLUSTERS) - corresponded cluster * I used N_CLUSTERS = 3: ![](https://i.imgur.com/sqazODR.png) ## For each cluster, fit linear regression locally ### Normal equation ![](https://i.imgur.com/18ay84l.png) * For each cluster we fit linreg - [theta0, theta1] * Finally we have N_CLUSTERS linregs, shape is (N_CLUSTERS, 2) ### Draw this regressions: ![](https://i.imgur.com/jyvzfpU.png) ## Create final predictor (degranularized model) ![](https://i.imgur.com/YBsBJ0S.png) * Ai(xj) - proba that xj belongs to the cluster i * ai - coefficients of i-th linear regression * shape(A) is (N_POINTS, N_CLUSTERS) * shape(a) is (N_CLUSTERS, 2) * shape(X) is (N_POINTS, 2) // second dimension is for bias ## This formula can be vectorized y = (X @ a.T * A).sum(1) * @ is matrix multiplication * "*" is elementwise multiplication * .sum(1) summation over first(second) dimension * shapes: ((N_POINTS, 2) @ (2, N_CLUSTERS) "*" (N_POINTS, N_CLUSTERS)).sum(1) -> (N_POINTS, 1) ### So for any xi we have weighted sum of linear regressions outputs where each weight is probability that this point belongs to this cluster. ## Calculate mse ![](https://i.imgur.com/I1yKgcb.png) * Implement mean squared from scratch or use sklearn ## Plot degranularized model ![](https://i.imgur.com/2cWWilI.png)