# Txn classification issue ## description issue - [x] Li xi - sample xid : TF240210097557930 - desscription : `Timo LI XI` - Expected tag : **Gift** - Actual predict : **Food & Drink** - Reason : `timo li xi` is eliminated in preprocess scala script - Solution : modify `word_replacements.json` file in fortress - [x] Đóng tiền học *from Trí Trần* - sample xid : TF240129175132578 - desscription (preprocessed) : `nguyen y van hoc ky 2333 22011874` - Expected tag : **Education** - Actual predict : **Transport** - Reason : `nguyen` match to `chuyen` due to **fuzzy match**, combine with `van` (user's name) make the description match with `van chuyen` - Solution : - stop using fuzzy match; - up weight the bi-gram and tri-gram match so the word separted like and not in order is not being biased - remove `des_acc_name` from description . - [x] Split bill sushi *from Duyen UX* - sample xid : TF240221198914264 - desscription : `split bill sushi wagao` - Expected tag : **Food & Drink** - Actual predict : **Miscellaneous Shopping** - Reason : `split bill` term is matched. Beacause of that, personalize rule match this transaction with a retagged transaction in the past which is retagged as Shopping - Solution : add `split bill` to the list of term to remove in preprocess step. **warning** : change the preprocess on both index & query procedure. - ![image](https://hackmd.io/_uploads/SJoPhMX36.png) - [ ] *Tri Tran* feel unreasonble about non-information transaction with large ammount. - sample xid : TF211110198924982 - Expected tag: **Gift** ??? - Actual tag: **Food & Drink** - Senerio : user transfer a large ammount of money and that transaction being tagged **Food&Drink**, which was not reasonable. - Solution : when there is no information, predict using global distribution of each tag. - [x] Rice from hell :skull: - sample xid : TF240226189686554 - description : `E Van ck com am phu` - Expected tag: **Food & Drink** - Actual predict: **Tax/Fee** - Reason: `am phu`(hell) rice is a not a popular kind of rice. We got `com tam`, `com trua`, `com rang` but wtf is `com am phu`. `am phu` will be matched with `phu thu` and `phu phi`. `phu` match with `phu` and [`thu` or `phi`] (because of fuzziness) - Solution : add `com` to keyword (just do it) adjust fuziness to short keyword - [x] VCAM txn - HARD_MATCH - sample xid : TF240304171283682 - Expected tag : **Investment** - Actual predict **Miscellaneous Shopping** - Reason : back-end side can't catch the `target_account_number` for querying to elasticsearch. Observe from (below) log : `targetAccNumber=null` - Goal : `targetAccNumber` of VCAM transcation is sent to elasticsearch query builder. For detail, `targetAccNumber!=null` - Log [here](https://timovn-my.sharepoint.com/personal/hoc_chuc_timo_vn/Documents/Microsoft%20Teams%20Chat%20Files/Log%20details-logs-2024-03-04%2010_47_43.txt) - [ ] merchant id in MERCHANT_POS_DR rule - sample xid : TF240301250638671 - Expected tag : **Shopping** - Actual tag : **Food&Drink** - Reason : back-end side can't catch the `merchant_id` for querying to elasticsearch. Observe from (below) log : ` merchantNumber=null` - ~~Goal : `merchantNumber` of debit card transaction is sent to elasticsearch query builder. For detail, `merchantNumber != null`~~ - New solution : use `merchant name` instead due to technical issue. - Log [here](https://timovn-my.sharepoint.com/personal/hoc_chuc_timo_vn/Documents/Microsoft%20Teams%20Chat%20Files/Log%20details-logs-2024-03-05%2014_22_06.txt) - [ ] merchant collision ![image](https://hackmd.io/_uploads/HJHmvU_6a.png) - [x] remove name - [ ] merchant name not match - sample xid : TF240323155050424 - Expected tag : **Entertain** (due to merchant name is `CGV`) - Actual tag : **Shopping** - Reason : merchant name is not updated. - Solution : merchant name will be updated automatically in when the next version launch - - [ ] mua xoai - sis Tam - sample xid : TF240326155569904 - Expected tag : **Groceries** (due to merchant name is `CGV`) - Actual tag : **Fashion** - Reason : no `xoài` in our keyword pool. `xoài` is too short and will caused noise if added in to keyword pool. - Solution : remaster keyword rules. (HARD !) - [ ] Credit card - `ck` - any txn that include `ck` in description is being tagged as Credit card which is not true.