# Distance/Similarity Metrics ###### tags: `Data Science`, `Metrics` ## Metrics | Distance metrics | Continuous variables | Dichotomous variables<sup>1, 2</sup> | | -------- | -------- | -------- | | Manhattan distance | $D_{X,Y}=\sum_{i=1}^{n}\lvert x_i - y_i \rvert$ | $D_{X,Y}=a+b-2c$ | | Euclidean distance | $D_{X,Y}=\sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}$ | $D_{X,Y}=\sqrt{a+b-2c}$ | | Cosine similarity | $S_{X,Y}=\frac{\sum_{i=1}^{n}x_iy_i}{\sqrt{\sum_{i=1}^{n}(x_i)^2\sum_{i=1}^{n}(y_i)^2}}$ | $S_{X,Y}=\frac{c}{\sqrt{ab}}$ | | Dice similarity | $S_{X,Y}=\frac{2\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}(x_i)^2+\sum_{i=1}^{n}(y_i)^2}$ | $S_{X,Y}=\frac{2c}{(a+b)}$ | | Tanimoto similarity | $S_{X,Y}=\frac{\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}(x_i)^2+\sum_{i=1}^{n}(y_i)^2-\sum_{i=1}^{n}x_iy_i}$| $S_{X,Y}=\frac{c}{(a+b-c)}$ | | Soergel distance | $D_{X,Y}=\frac{\sum_{i=1}^n\lvert x_i-y_i \rvert}{\sum_{i=1}^nmax(x_i, y_i)}$| $D_{X,Y}=1-\frac{c}{(a+b-c)}$ | <sup>1</sup> $S$ denotes similarity, while $D$ denotes distance <sup>2</sup> $a$ is the number of on bits in vector $X$, $b$ is number of on bits in vector $Y$, while $c$ is the number of bits that are both in $X$ and $Y$ vectors. ## Reference * Bajusz D, Rácz A, Héberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?. J Cheminform. 2015;7:20. Published 2015 May 20. doi:10.1186/s13321-015-0069-3