# Tokenization procedure
v 0.1
[TOC]
## Current Flow
```plantuml
@startuml
start
:<b>STEP 1</b>\n[IT] Files with exported profiles downloaded \nfrom provider's \(eg. Stripe, Affinipay etc.) SFTP \n(sometimes 2 of them, one for CC one for ACH/BA\n FILE COUNT SO FAR: 2;
:<b>STEP 2</b>\n[IT] Sensitive columns are masked. \nWe create masked copies of those files for Ops team\nFILE COUNT SO FAR: 4;
:<b>STEP 3</b>\n[IT] Masked files are attached to the JIRA ticket;
:<b>STEP 4</b>\n[Ops] Files are reviewed; addresses are amended etc.;
:<b>STEP 5</b>\n[Ops] Passes files back to the IT for \ntokenization (usually subset, or in multiple files)\n FILE COUNT SO FAR: 6?;
:<b>STEP 6</b>\n[IT] Performs checks on files, matches \nthese files against original, unmasked files, \nresolves problems and produces files for \ntokenization (one for CC records one for ACH/BA records)\n FILE COUNT SO FAR: 8;
:<b>STEP 7</b>\n[IT] Runs tokenization scripts on those 2 files.\n Each run creates an output of 2 files, \n(errors file and successful tokenizations file) \nso for entire step - 4 files\n FILE COUNT SO FAR: 12;
:<b>STEP 8</b>\n[IT] Puts files in the JIRA ticket for the Ops team to pick up;
stop
```
>[color=red] <b>Note</b>: in case of some data missing (which is often the case) - another wave of this operation is necessary, in which case steps 4 to 8 have to be repeated.
## Current problems
- Pushing around different files, too many of them, and with different formats of data: source files are different, revised files different, output files also different.
- Necessity to continuously combine file, doing comparison and validation
- (often) lack of unique, reference column (makes comparing files and finding records difficult, or just, much slower - have to match by multiple columns).
- lack of standarization, and thus difficulty with tracking and securing quality of data
- data quality problems (addresses first and foremost)
## Proposed flow V 2.0
```plantuml
@startuml
start
:<b>STEP 1</b>\n[IT] Files with exported profiles downloaded \nfrom provider's \(eg. Stripe's) SFTP \n(sometimes 2 of them, one for CC one for ACH/BA\n FILE COUNT SO FAR: 2;
#palegreen:<b>STEP 2</b>\n[IT] STEP 2; Files go through the scripts that:\n 1) combine files into 1 (CC and ACH/BA records together)\n 2) validate data\n 3) autocorrect data where safe & possible\n 4) perform auto-fix of address fields \n 5) Unique reference column is added [mxReference]\n A status column is added [mxStatus] indicating whether we as \n an IT believe the record to be valid and ready for tokenization\n6) create masked version of file for Ops team \n FILE COUNT SO FAR: 3;
:<b>STEP 3</b>\n[IT] Masked files are attached to the JIRA ticket;
:<b>STEP 4</b>\n[Ops] Files are reviewed; addresses are amended etc.;
#palegreen:<b>STEP 5</b>\n[Ops] Passes a new version of the same file back to the IT for \ntokenization:\n 1) number of records stays the same \n(no additions, no subtractions unless agreed)\n 2) always same format, CSV\n 3) if some records are not to be tokenized \n - Ops changes their mxStatus \n4) if extra columns are to be added for the \nvalidation/inspection process \n- they are added with mx-prefix after original columns\n FILE COUNT SO FAR: 3;
:<b>STEP 6</b>\n[IT] Performs checks on file received from the Ops\n FILE COUNT SO FAR: 3;
#palegreen:<b>STEP 7</b>\n[IT] Runs tokenization scripts on previous\n 1 file containing both CC and ACH.\n Each run creates an output of the same file,\n only with extra columns:\n- mxStatus(tokenized|tokenization_error),\n- mxTokenizedAt +\n- columns depending on tokenization output\n (error message or tokens data)\n FILE COUNT SO FAR: 3;
:<b>STEP 8</b>\n[IT] Puts files in the JIRA ticket for the Ops team to pick up;
stop
```
### Expected benefits
- procedure simplified, more manageable
- Less manual intervention required for both Ops and IT
- errors and problems easier to discover and track
## TODOs [IT]
- prepare preValidation script (input files)
- ~~prepare Geo-Checker script for auto-fixing addresses~~
- change Tokenization script to accept unified files of new format, and to output files
## Current CSV row vs Proposed future CSV row
### Current Row
```csvpreview {header="true"}
description name email id card.address_city card.address_country card.address_line1 card.address_line2 card.address_state card.address_zip card.exp_month card.exp_year card.id card.name card.number default_source card.transaction_ids
John Doe John Doe john.doe@gmail.com cus_Pt7EYURu9wJhC8 US 92057 4 2027 card_1P3L03K6NnMfWf5L2WaPJgXJ John Doe xxxxxxxxxxx1007 card_1P3L03K6NnMfWf5L2WaPJgXJ 7470871316301
```
### V2 Row
```csvpreview {header="true"}
mxReference description name email id card.address_city card.address_country card.address_line1 card.address_line2 card.address_state card.address_zip card.exp_month card.exp_year card.id card.name card.number default_source card.transaction_ids mxStatus mxAutoFixes mxValidationErrors mxTokenizationError mxCustomerVaultToken mxVaultToken mxTransactionId CustomColumn1
5e268d41-ed44-4b6f-bba9-3098dca750b8 John Doe John Doe john.doe@gmail.com cus_Pt7EYURu9wJhC8 US 92057 4 2027 card_1P3L03K6NnMfWf5L2WaPJgXJ John Doe xxxxxxxxxxx1007 card_1P3L03K6NnMfWf5L2WaPJgXJ 7470871316301 ready_for_tokenization {“card.address_country”: [“United States of America”, “US”]} (If tokenization failed) (If tokenization succeeded) (If tokenization succeeded) (If tokenization succeeded)
```
#### mxStatuses, possible values
- [ ] ready_for_tokenization
- [ ] invalid
- [ ] tokenized
- [ ] tokenization_error