---
title: CDC Diagnosis (working)
tags: ded
---
**TODO**:
Ask IT
- [x] Contact IT support to get info of `ACCOUNT` and `STMT_ENTRY` table (size, number of rows)
- [x] Contact IT support to confirm that the data contained in current DWH is data from 2021
- [x] How to get trigger condition for:
+ DBLINK: `INFO_CARD`
+ OGG: `ACCOUNT`, `STMT_ENTRY`
+ Batch: `VDWHPROCESSINGGATEINAMC`
- [x] How to get trigger condition for: (đã gửi mail hỏi IT)
+ SQL procedure: `EDW_DMT.R_TBL_LN_0016_PLN`
+ SQL procedure: `EDW_DMT.R_TBL_LN_0024_HR`
Other
- [x] Ask Duong Risk: Description of table `VDWHPROCESSINGGATEINAMC`
- [x] Ask Duong Rick: Output of feature list? A table contains all features and by customerId, process_date or what?
- [x] Ask Duong Risk: the timestamp needed for each daily snapshot fact table and transaction fact table (eg. Need last 6 months? or need the whole history in last 3 years?)
To follow:
- [ ] Follow up with Duong risk on final version of feature lists. He said end of this week will complete.
**NOTE**:
- All data table in `DWH` contains data from `2021-01-01`. All past data need to get from `DWH2018/DWH2019/DWH2020`
- All data estimated below based on `DWH`
## `DWH.ACCOUNT`, `T24: FBNK_ACCOUNT`
**Description**: Daily snapshot of customer account.
**Logic ETL from source**:
- ETL 1-1 from source, add `PROCESS_DATE` as ETL day.
PS5 needs historical data to calculate casa balance and overdraft balance of each account.
- Load all data in `DWH.ACCOUNT/V_MASK_ACCOUNT` for historical data, remember to mask salary account (`product_category=1006`)
+ Size/Number of rows in total: ~ 1 billion rows
+ Partitioning based on: `PROCESS_DATE`
- Get daily data from source `T24: FBNK_ACCOUNT`
+ Batch loading
+ Estimate workload daily: ~ 8 million rows
+ How and When to trigger job?
- S3 structure: `YEAR/MONTH/DATE/CUSTOMER_ID/ACCOUNT_NUMBER`
## `DWH.STMT_ENTRY`, `T24: FBNK_STMT_ENTRY`
**Description**: Card transaction fact table.
**Logic ETL from source**:
- ETL 1-1 from source
PS5 needs all historical data.
- Load all data in `DWH.STMT_ENTRY`, remember to mask transactions from salary account (`product_category=1006`)
+ Size/Number of rows in total: , contains data from `2021-01-01` to `T-1`
+ Partitioning based on: `PROCESSING_DATE`, `CUSTOMER_ID`
- Get daily data from source `T24: FBNK_STMT_ENTRY`
+ Batch loading (_or streaming, which one is better?_)
+ Estimate workload daily: ~ 16 million rows
+ CDC based on `BOOKING_DATE` (Get all transaction in current processing date)
- S3 structure: `YEAR/MONTH/DATE/CUSTOMER_ID/ACCOUNT_NUMBER` (year, month, date extracted)
## `DWH.INFO_CARD`, `T24: FBNK_INFO_CARD`
**Description**: Daily snapshot of customer card.
**Logic ETL from source**:
- `INFO_CARD`: ETL 1-1 from source, get lastest data.
- `INFO_CARD_TSD`: ETL 1-1 from source, add three more fields:
+ `SURROGATE_KEY`: `INFO_CARD_TSD_SEQ.NEXTVAL`
+ `CURRENT_RECORD`: 1
+ `CREATION_DATE`: equals `to_date('#v_gbl_gg_processing_date','YYYYMMDD')`
+ `END_DATE`: equals `to_date('24000101','YYYYMMDD')`
PS5 needs only lastest data.
- (optional for historical data) Load all data in `DWH.INFO_CARD_TSD` (SCD type 2 table)
+ Size/Number of rows in total: 11Gb, 19 million rows, contains data from `2017-01-03`
+ Batch loading, cut file based on `SURROGATE KEY`
- Get daily data from source `T24: FBNK_INFO_CARD`
+ Batch loading
+ Estimate workload daily: 9Gb, ~ 11 million rows
+ How and When to trigger job?
- S3 structure: `YEAR/MONTH/DATE/CUSTOMER_ID/ACCOUNT_NUMBER`
## `CMS: DWH_COLLECTIONCENTRALBANK`
**Description**: Daily snapshot of credit cards.
**Logic ETL from source**:
- ETL 1-1 from source, add `PROCESS_DATE` as ETL day.
PS5 needs historical data (to calculate credit card statement balance in last 6 months).
- Load all data in `DWH.CMS_COLLECTIONCENTRALBANK`
+ Size/Number of rows in total: ~60 million rows, contains data from `2020-01-01` to `T-1`
+ Partitioning based on `PROCESS_DATE`, `PRODCODE`
- Get daily data from source `CMS: DWH_COLLECTIONCENTRALBANK`
+ Batch loading
+ Estimate workload daily: 500k rows
+ Trigger: check logic in function `FNC_CHECK_CMS_COB_STATUS` and `FNC_CHECK_CMS_COB_SCH`
- S3 structure: `YEAR/MONTH/DATE/CUSTOMER_ID/PRODCODE/ACCOUNT_NUMBER`
## `DMS: VDWHPROCESSINGGATEINAMC`
**Description**: Type IPL (upsert), save lastest information of???
**Logic ETL from source**:
- ETL 1-1 from source, add `PROCESS_DATE` as ETL day.
For PS5:
- No need to initial load from DWH
- Directly daily load from source:
+ Batch loading, daily job (ETL by truncating and repopulating)
+ Size/Number of rows: ~2 millions rows
+ Trigger: ???
- S3 structure: `CUSTOMER_ID/IDCARD` or `PROCESSINGGATE/CUSTOMER_ID`
Schedule ETL data at source:
- Data từ DWH được đổ vào bảng Temp (1.1), rồi từ bảng Temp đổ vào các view trên DMS (1.2). Bước 1.1 thì real time; còn bước 1.2 thì đc schedule vào 7h, 12h, 20h và được chạy trong khoảng 30-60'.
- Dữ liệu trên view sẽ là dữ liệu tại thời điểm bên chị kéo
## `EDW_DMT.R_TBL_LN_0016_PLN`
**Description**: Daily snapshot for customer's loan contract.
For PS5:
- Initial load from `EDW_DMT.R_TBL_LN_0016_PLN`
+ Size/Number of rows in total: ~430 million rows, 82Gb, contains data from `2017-07-11` to `T-1`
+ Partitioning based on: `NGAY_BAO_CAO` or `BUT_TOAN` or `NGAY_BAO_CAO/CUSTOMER_ID`
- Get daily data from `EDW_DMT.R_TBL_LN_0016_PLN`, where `NGAY_BAO_CAO` is the current ETL date:
+ Batch loading
+ Estimate workload daily: 500k rows
+ Trigger: ??? (job not run during holiday/sunday)
- S3 structure: `NGAY_BAO_CAO/CUSTOMER_ID/BUT_TOAN`
Schedule ETL data at source:
-
## `EDW_DMT.R_TBL_LN_0024_HR`
**Description**: Daily snapshot of employees of Techcombank.
For PS5:
- Initial load from `EDW_DMT.R_TBL_LN_0024_HR`
+ Size/Number of rows in total: ~6.5 mil rows, contains data from `2019-07-19` to `T-1`
+ Partitioning based on `PROCESS_DATE`
- Get daily data from `EDW_DMT.R_TBL_LN_0024_HR`, where `PROCESS_DATE` is the current ETL date:
+ Batch loading
+ Estimate workload daily: ~14k rows
+ Trigger: ??? (job not run during holiday/sunday)
- S3 structure: `YEAR/MONTH/DATE`