# RR Work Specification
### Endpoint to index/re-index a given chapter
- [ ] `handler` takes `doc_html`, `story_id`, and `doc_num`
- [ ] `model` to track `story_id` + `doc_num` + `avg_embedding` + `qdrant_point_id` = `DocEmbedding`
#### Delete previous embeddings for chapter
- [ ] `query` to remove filtered `DocEmbedding`'s from Postgres
- [ ] `query` to remove filtered `DocEmbedding`'s from Qdrant
#### Create new embeddings for chapter
- [ ] segmentation script to break up `doc_html` into individual embeddings (mostly done)
- [ ] `operator` to GPU to get `embedding` for segments in batch
- [ ] `query` to insert `DocEmbedding`'s into Postgres
- [ ] `query` to insert `DocEmbedding`'s' into Qdrant collection
### Endpoint to average/re-average for a given story id
- [ ] `handler` takes: `doc_group_size`, `story_id`
- [ ] `handler` creates new Qdrant collection for `doc_group_size` if it does not exist
- [ ] `model` to track `story_id` + `doc_group_size` + `doc_group_index` + `avg_embedding` + `qdrant_point_id` = `DocGroupEmbedding`
- [ ] `query` to get all of the `DocGroupEmbedding`'s filtered by `story_id`
- [ ] `logic` to get the oldest `DocGroupEmbedding` returned
- [ ] `query` to get the `story_id` + `doc_num` for any `DocEmbedding`'s which have have a `created_at` more recent than the oldest `DocGroupEmbedding`
- [ ] `logic` to locate any `doc_group_index`'s which need to get upserted
- [ ] `logic` to calculate `avg_embedding` for new `doc_group_index`'s
- [ ] `query` to upsert the necessary `doc_group_index`'s
### Endpoint to recommend content
Might be some edge-cases here for users who have crazy numbers of liked/favorited stories
- [ ] `handler` takes `story_id[]`, `doc_group_size`
- [ ] `query` to postgres to get `qdrant_point_id`'s for the `story_id[]` and `doc_group_size`
- [ ] `query` to qdrant to get recommendations for a given `doc_group_size` collection based on the `qdrant_point_id`'s in the positive query param list
### Endpoint to semantically search
- [ ] `handler` takes `doc_group_size`, `query_string`
- [ ] `operator` to get embedding from GPU for `query_string`
- [ ] `query` to Qdrant to get semantically similar results for the `query_string` for appropriate `doc_group_size`
### Endpoint for query vs. vector
Assuming, we do the `*` stuff for `DocSubEmbedding` then if `doc_group_size` is 1, we should search them instead of `group`'s'
- [ ] `handler` takes `doc_group_size`, `doc_num`, and `story_id`
- [ ] `logic` to grab the right `qdrant_collection`
- [ ] `query` to Qdrant to get semantically similar results for the `query_string` for appropriate `doc_group_size`. Can use similarity to make content moderation decisisons.