# BC Registries Issuer Updates
## Existing BC Reg Issuer Process
The existing BC Reg issuer jobs are descrbed [here](https://github.com/bcgov/von-bc-registries-agent/tree/master/data-pipeline). (The docs are a bit out of date but the overall architecture/approach hasn't changed.) The data is pulled from COLIN (old Oracle database) and generated credentials are posted to OrgBook. The event processor uses a postgres database to store "in process" and "posted" data.
All python code dealing with reading the BC Reg database and loading data is [here](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/bcregistries.py).
All python code dealing with parsing the staged json data and generating credentials (and maintaining the event processor tables) is [here](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py).
The jobs are run using cron - there is a bunch of infrastructure based on [Mara Pipelines](https://github.com/mara/mara-pipelines) but it is not providing much value and we should consider re-factoring it out of existance someday.
Basically the steps are (these run every 30 mintues in `prod`, and on request in `dev` and `test`):
1. Scan the source `events` table and determine what corps have been updated since the last run - update the [event_by_corp_filing](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L209) table with a set of corps to be processed. The "event date" is used to determine what has been posted and what is new (there are multiple sources for the `event_id` so it is not sequential) - this is recorded in the [last_event](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L188) table.
2. Load all data for each corp into a big json blob and store in the [corp_history_log](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L253) table. All data is loaded into json, including the full history.
3. For each corp in the corp_history_log table, and based on the from/to events we are processing, create credentials to post to OrgBook and store in the [credential_log](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L312) table.
4. Post credentials to OrgBook and update the status in the zzz table
Each event processor table (xxx, yyy, zzz) has a set of fields to record the processing status (pending, processed or error). Data can be re-processed just by changing the status to pending. The credential staging table includes a hash of the credential to prevent duplicates getting posted to OrgBook.
## Updates Required to support the new LEAR Database
Theoretically we just need to add new jobs for steps 1 and 2 above - if we load the same structure of json blob for the corporation then steps 3 and 4 will require probably minimal changes to deal with differences between the old COLIN and new LEAR databases.
### 1b. Monitor LEAR for updates
We need to add a new job to monitor the new `transaction` table, and create two new tables in the event_processor database to record the status of LEAR corporation udpates:
- `last_transaction` - similar to the existing `last_event`, record the last LEAR transaction processed. I believe we should use `filing_date` (not `transaction_id`) to record the latest processed transaction, which is consistent with how we currently process events
- `transaction_by_corp_filing` - similar to the existing `event_by_corp_filing`, this will include a list of companies to process from LEAR (based on the transactions since the last update)
We maintain a [list of company types](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L83) to load from COLIN, we will need a corresponding list of company types to load from LEAR. As new company types are moved over we can maintain the types in these 2 lists.
## 2b. Load Corporation Data from LEAR
There are a couple of options for this one.
The current process is executed from [this function](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L1876), which loops through all the companies identified in the previous step.
That runs [this function](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/eventprocessor.py#L1547), which is a bit convoluted because it is parametrized to run each processing step separately, or all steps together for each company.
Eventually it calls [this function](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/bcregistries.py#L1566) to load the data from BC Registries (COLIN) into a json blob, which is stored in the event processor database.
As you can see, this just loads a json blob with the same structure of the underlying BC Reg database.
### Option 2b.1 - just modify the SQL statements
When loading data from LEAR, just modify the SQL statements to map to the old COLIN column names - this is probably the easiest approach.
For example, for this statement:
```
SELECT corp_num, corp_typ_cd, recognition_dts, last_ar_filed_dt, bn_9, bn_15, admin_email, last_ledger_dt
FROM corporation
WHERE corp_num = <some corp_num>
```
Use something like:
```
SELECT identifier as corp_num,
legal_type as corp_typ_cd,
founding_date as recognition_dts,
null as last_ar_filed_dt,
null as bn_9,
bn_15 as bn_15,
? as admin_email,
null as last_ledger_dt
FROM business
WHERE identifier = <some corp_num>
```
There will be some conditional logic, since not all COLIN fields are mapped to LEAR.
Most significantly, the transaction effective date (event effective date in COLIN) is provided, so we don't need to migrate the convoluted set of logic we have for determining event effective dates.
The BN_9 isn't provided (we post this to OrgBook) but we can get it from the BN_15 (substring first 9 digits).
### Option 2b.2 - change the logic when building the stored json object
I think this makes less sense, as it is more work than just updating the SQL queries and doesn't provide additional value-add.
For example in [this method](https://github.com/bcgov/von-bc-registries-agent/blob/master/data-pipeline/bcreg/bcregistries.py#L1478) (like many) doesn't even use the database column names and just uses the offset in the SQL result set.
### Option 2b.3 - generate a new json object base on the new LEAR table structure
This would *also* require a new set of logic for step 3, which parses the json object and generated the credentials.
I think this is probably best for long term, as we can ditch a lot of the "legacy" logic that is required to parse COLIN data, however my not be feasible to handle the initial company types that we are loading from LEAR.