# Txn classification issue
## description issue
- [x] Li xi
- sample xid : TF240210097557930
- desscription : `Timo LI XI`
- Expected tag : **Gift**
- Actual predict : **Food & Drink**
- Reason : `timo li xi` is eliminated in preprocess scala script
- Solution : modify `word_replacements.json` file in fortress
- [x] Đóng tiền học *from Trí Trần*
- sample xid : TF240129175132578
- desscription (preprocessed) : `nguyen y van hoc ky 2333 22011874`
- Expected tag : **Education**
- Actual predict : **Transport**
- Reason : `nguyen` match to `chuyen` due to **fuzzy match**, combine with `van` (user's name) make the description match with `van chuyen`
- Solution :
- stop using fuzzy match;
- up weight the bi-gram and tri-gram match so the word separted like and not in order is not being biased
- remove `des_acc_name` from description .
- [x] Split bill sushi *from Duyen UX*
- sample xid : TF240221198914264
- desscription : `split bill sushi wagao`
- Expected tag : **Food & Drink**
- Actual predict : **Miscellaneous Shopping**
- Reason : `split bill` term is matched. Beacause of that, personalize rule match this transaction with a retagged transaction in the past which is retagged as Shopping
- Solution : add `split bill` to the list of term to remove in preprocess step. **warning** : change the preprocess on both index & query procedure.
- 
- [ ] *Tri Tran* feel unreasonble about non-information transaction with large ammount.
- sample xid : TF211110198924982
- Expected tag: **Gift** ???
- Actual tag: **Food & Drink**
- Senerio : user transfer a large ammount of money and that transaction being tagged **Food&Drink**, which was not reasonable.
- Solution : when there is no information, predict using global distribution of each tag.
- [x] Rice from hell :skull:
- sample xid : TF240226189686554
- description : `E Van ck com am phu`
- Expected tag: **Food & Drink**
- Actual predict: **Tax/Fee**
- Reason:
`am phu`(hell) rice is a not a popular kind of rice. We got `com tam`, `com trua`, `com rang` but wtf is `com am phu`.
`am phu` will be matched with `phu thu` and `phu phi`. `phu` match with `phu` and [`thu` or `phi`] (because of fuzziness)
- Solution :
add `com` to keyword (just do it)
adjust fuziness to short keyword
- [x] VCAM txn - HARD_MATCH
- sample xid : TF240304171283682
- Expected tag : **Investment**
- Actual predict **Miscellaneous Shopping**
- Reason : back-end side can't catch the `target_account_number` for querying to elasticsearch. Observe from (below) log : `targetAccNumber=null`
- Goal : `targetAccNumber` of VCAM transcation is sent to elasticsearch query builder. For detail, `targetAccNumber!=null`
- Log [here](https://timovn-my.sharepoint.com/personal/hoc_chuc_timo_vn/Documents/Microsoft%20Teams%20Chat%20Files/Log%20details-logs-2024-03-04%2010_47_43.txt)
- [ ] merchant id in MERCHANT_POS_DR rule
- sample xid : TF240301250638671
- Expected tag : **Shopping**
- Actual tag : **Food&Drink**
- Reason : back-end side can't catch the `merchant_id` for querying to elasticsearch. Observe from (below) log : ` merchantNumber=null`
- ~~Goal : `merchantNumber` of debit card transaction is sent to elasticsearch query builder. For detail, `merchantNumber != null`~~
- New solution : use `merchant name` instead due to technical issue.
- Log [here](https://timovn-my.sharepoint.com/personal/hoc_chuc_timo_vn/Documents/Microsoft%20Teams%20Chat%20Files/Log%20details-logs-2024-03-05%2014_22_06.txt)
- [ ] merchant collision

- [x] remove name
- [ ] merchant name not match
- sample xid : TF240323155050424
- Expected tag : **Entertain** (due to merchant name is `CGV`)
- Actual tag : **Shopping**
- Reason : merchant name is not updated.
- Solution : merchant name will be updated automatically in when the next version launch
-
- [ ] mua xoai - sis Tam
- sample xid : TF240326155569904
- Expected tag : **Groceries** (due to merchant name is `CGV`)
- Actual tag : **Fashion**
- Reason : no `xoài` in our keyword pool. `xoài` is too short and will caused noise if added in to keyword pool.
- Solution : remaster keyword rules. (HARD !)
- [ ] Credit card - `ck`
- any txn that include `ck` in description is being tagged as Credit card which is not true.