Backfills - HackMD

# Backfills ## Purpose We need to backfill the ES index with data from products so that we can have a starting point from which to provide data to search service. ## Strategies I have a few proposed methodologies. ### Go command - Menu Service An executable `go` command should exist on the menu service. Tihs would go through each database product and publish its representation to the message queue. #### Pros * Executable via command line * Can be run in GoCD similar to rake task * Separate binary from main menu service (separation of concerns) * Does not need own monitoring if writing to stdout/stderr #### Cons * Not as easily invoked * Might not be able to rely on desired existing modules (e.g. session) ### API Endpoint The menu service should expose an endpoint that would only be queried within the Slice development environment. This endpoint, when queried, would build up the data based on passed in product IDs or all product IDs and publish its representation to the message queue. #### Pros * Easily invokable - just a simple cURL request away * More integrated; easier access to internal modules (e.g. NewRelic, logging, etc.) * Likely easiest to implement because of ^ #### Cons * At least loosely coupled to menu service logic - no separate binary * Cannot be invoked via command line ### `rake` or similar task runner It would be convenient to execute a one-off task like `rake` and the admin command for running a backfill for search. See Running Rake Tasks on Staging/Production. This could live in the `task-runner` pipeline but be separate from the menu service logic. #### Pros * One-off job specifically suited for this purpose * Isolated from menu service #### Cons * Need to find a decent, well supported/maintained library * Might not be able to rely on desired existing modules (e.g. NewRelic, logging, etc.) * Invoking this would be less obvious See [Running Rake Tasks on Staging/Production](https://mypizza.atlassian.net/wiki/spaces/DEVOPS/pages/458391553/Running+Rake+Tasks+on+Staging+Production). ## General Workflow * A list of product or shop IDs is received from the user * If empty, perform a query to the datastore to receive each shop and for each shop, each product * Relevant info to fetch: * [Clubhouse story - create product mapping](https://app.clubhouse.io/slicelife/story/64429/es-create-the-product-mapping) * [GitHub - product mapping JSON](https://github.com/slicelife/search/blob/master/elastic_search_settings/product_mapping.json) * Craft a message payload that contains entires for each shop * Using internal API methods to read data, for each shop: * Fetch each product * Use the product ID (NOT grouped - see [this story](https://app.clubhouse.io/slicelife/story/64569/spike-find-out-unique-identifier-for-elasticsearch-document)) * Append relevant product and shop info to request body entry * If failure at this point, trap and report error to user * Publish to message queue * Message header and signify bulk update * It may be desirable to leverage the existing publish-to-queue functionality as opposed to introducing an ES dependency (i.e. the bulk API)