# ANN
- Symbols:
- N: num dataset items.
- d: length of one item.
- k: num items to query.
## LSH:
- Generate random m vectors of length d.
- Hash function: for each generated vector, assign 1 if same direction, otherwise 0.

Calculate the hash for all dataset items and the query item.
- Within the dataset items having same (or minimize hamming distance) hash with the query item:
- Brute force all.
- Return random k vectors (**more common** if m >> d).
- Complexity: $O(m^2)$
- Calculate hash: $O(m.d)$
- Search: $O(m^2)$
## PQ:
- Hash function:
- Segment a vector into m sub-vectors.

- In each sub-vector position, calculate c centroids using k-Means from all dataset items.

- Replace each sub-vector with its centroid ID (length = log2\(c\))
- For query item, calculate distance between its sub-vectors and all centroids.
- For all dataset items, get the sum distance to the query item based on above distance to centroids.

- Complexity: $O(d.c) + O(N.m)$
- Calculate distance to centroids: $O(d.c)$
- Search: $O(N.m)$
## IVFPQ:
- Find c clusters from dataset items using kMeans.
- For each cluster, calculate PQ hash for all residual vectors = raw - centroid.
- For the query item, find its cluster and nearby clusters, query PQ hash for all items in those clusters.