# Distance/Similarity Metrics
###### tags: `Data Science`, `Metrics`
## Metrics
| Distance metrics | Continuous variables | Dichotomous variables<sup>1, 2</sup> |
| -------- | -------- | -------- |
| Manhattan distance | $D_{X,Y}=\sum_{i=1}^{n}\lvert x_i - y_i \rvert$ | $D_{X,Y}=a+b-2c$ |
| Euclidean distance | $D_{X,Y}=\sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}$ | $D_{X,Y}=\sqrt{a+b-2c}$ |
| Cosine similarity | $S_{X,Y}=\frac{\sum_{i=1}^{n}x_iy_i}{\sqrt{\sum_{i=1}^{n}(x_i)^2\sum_{i=1}^{n}(y_i)^2}}$ | $S_{X,Y}=\frac{c}{\sqrt{ab}}$ |
| Dice similarity | $S_{X,Y}=\frac{2\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}(x_i)^2+\sum_{i=1}^{n}(y_i)^2}$ | $S_{X,Y}=\frac{2c}{(a+b)}$ |
| Tanimoto similarity | $S_{X,Y}=\frac{\sum_{i=1}^{n}x_iy_i}{\sum_{i=1}^{n}(x_i)^2+\sum_{i=1}^{n}(y_i)^2-\sum_{i=1}^{n}x_iy_i}$| $S_{X,Y}=\frac{c}{(a+b-c)}$ |
| Soergel distance | $D_{X,Y}=\frac{\sum_{i=1}^n\lvert x_i-y_i \rvert}{\sum_{i=1}^nmax(x_i, y_i)}$| $D_{X,Y}=1-\frac{c}{(a+b-c)}$ |
<sup>1</sup> $S$ denotes similarity, while $D$ denotes distance
<sup>2</sup> $a$ is the number of on bits in vector $X$, $b$ is number of on bits in vector $Y$, while $c$ is the number of bits that are both in $X$ and $Y$ vectors.
## Reference
* Bajusz D, Rácz A, Héberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?. J Cheminform. 2015;7:20. Published 2015 May 20. doi:10.1186/s13321-015-0069-3