# Deduplication workplan, week starting 22/07
## Antonio
* Create notebook showing how to extract fingerprints (wavelet hash, blockmean hash, image size, and activations from Resnet) - summary of findings
* Create a synthetic dataset of crop and zoomed image pairs (benchmarks for primary image deduplication)
* Try a logistic regression algo for deduplication using geolocation, addresses, names, and image phashes (match beyond primary image)
* Look into Apache Beam for algorithm deployment in Bigquery
## Tadas
* TF-IDF queries of names and addresses
* Evaluate results of MTurkers who did tagging of image pairs from duplicate offers