## BIG Data ### Assumptions - The millions of rows of Purchase orders & there relevant products are already on DB via replication process by Soriana. - We wont need any other data apart from what is there in the DB after replication - This process is only required for getting linked POs for products scanned ### Steps - Sync process - Data manipulation --- #### Sync Process This process will include consuming data from the tables updated by the replication process and finding relation among products & POs via redis jobs Steps are as follows - The idea is to read the data at once - Once its loaded, we break the data into chunks and assign it to jobs(Given that loading all of them at once is gonna cause an Out of Memmory error on node) - Since redis identifies jobs via a string value something like `retry-return-folios` We will be adding a unique value based on chunk indice to such jobs to identify which jobs have failed and which succeeded, if any. Example: 1. Data will be loaded in an array format, so let's say chunks size will 1000 per batch, which gives us the redis job identifier as 2. We wont be using async await in a loop - Inside each batch we will first include computations to find relations between POs & there products & also the relevant product details **Note** Also a possibility of calling `getOrCreateProductQuantity` or `getOrCreateLocationStock` for products not in DB(also a computation) --- #### Data Manipulation This process will come into play when we do require the POs linked to products(while scanning). The basic idea will be to find the related POs(inside validity) and send them in the same format as SAP currently sends. In order to send them, the exact logic needs to be depicted on node as shared via the files on email(ABAP code). ### Glossary - POs: Purchase Orders - DB: MySQL Database