# SEP CRM Data Ingestion ###### tags: `SEP`, `CRM`, `data ingestion`, `sftp` ## Change logs | Date | Content | Author | | ---------- |:----------------:|:-----------------------------------:| | 2021/01/27 | Added data security section and revised flow diagram | Nicky Goh [ nicky.goh@cyntelli.com] | | 2020/12/10 | Initial document | Nicky Goh [ nicky.goh@cyntelli.com] | ## Process Flow ![](https://i.imgur.com/ufewFhW.png) [](https://i.imgur.com/b1bhUyM.png) ## Data Security - IP whitelisting for SFTP server - Service endpoints are protected by HTTPS - Access token required for API(s) usage - Sensitive privacy data (phone, email, residential address and etc) should be encrypted in **SHA256** ## Source Type Each type of raw data source will be represented as following: For example: | Source Type | Source Name | Description | |:-----------:|:-------------:|-------------------------------------------------| | order | Order | Daily transaction data | | member | Member | Member's information and personal details | | product | Product | Product catalog with unit price | | category | Category | Product categories with hierarchy structure | | channel | Channel | Sales channel included offline store and e-comm | | redeem | Redeem | Redeem history from freebies and campaigns | | points | Loyalty Point | Point granted based on purchase activities | ## SFTP ### Account Policy - Limited to a maximum of 5GB on the SFTP workspace - 1GB of daily upload traffic quota ### Account Authorization User need to obtains credentials in order to access and upload files - SFTP Endpoint - Access Username - Access Password - IP address for whitelisting - API access token - Account-Binded Work Directory ### File Specification #### Naming Pattern Rule ``` {DATE(YYYYMMDD)}_{SOURCE TYPE}.{FILE EXTENSION} ``` For instance, > 20200125_order.csv (Order data of 2020/01/25) > 20200125_member.csv (Member data of 2020/01/25) #### Format The following table lists supported ingest formats. | Type | Encoding | Header Required | | ---------------------------- |:-------------------- |:--------------- | | CSV (Comma-Seperated Values) | UTF-8 (Without BOM) | Y | | TSV (Tab-Seperated Values) | UTF-8 (Without BOM) | Y | The first record as a header containing a list of field names **must be provided** for schema validation <sub>BOM is represents as Byte Order Mark</sub> #### Field Name - Field name must be lowercase - Field name should not contains any white space - Field name must be connected by using underscore ( _ ), such as *cust_name, contact_phone_no, contact_email* - Naming should be same as the field name in metadata - The number of fields should be identical with metadata ### File Upload User only allowed to access the work directory under their own account id and expected to place the uploaded files under the particular directory according to assigned source type *The misplaced files will be ignored by the system and not be processed* ``` └───110-1234-567 (account id) | | └───sources │ │ └───order | | | | | └───20200125_order.csv (Order data) | └───member | | └───20200125_member.csv (Member data) ``` ### File Update Update the previous records by overwritting the effected data by date For instance, > Current date is 2020/01/25 but you going to updates few records on 2020/01/20. Therefore, you need to re-upload the updated 2020/01/20 file with same filename to the same directory on SFTP server that would overwrite the previous data ### File Process Files uploaded after the pick up time will be processed on the next day Files will be moved to *processed* directory after they are all processed. Submitted files that failed to pass schema validation process will be moved to *invalid* directory ### File Archive Uploaded files will be kept as original form with a *maximum period up to 7 days*. Older files will be compressed and archived. ## Metadata The schema of each data source that going to upload to SFTP and uses in data processing and validation Metadata will be created by Cyntelli at the beginning after finalized the schema of each data source with user ### Query Schema Get the schema of a particular data source #### Request GET `v1/{accuid}/sources/{source_type}/schema` #### Response Status Code: `200` ```json= { "source_type": "member", "source_name": "Member", "description": "Member's information and personal details", "schema":[ { "name": "id", "type": "Integer" }, { "name": "name", "type": "String" }, { "name": "birthdate", "type": "Date" }, { "name": "address", "type": "String" }, { "name": "phone", "type": "String" } ], "updated_at": 1611563094, "updated_by": 1, "created_at": 1593563094, "created_by": 0 ``` ### Update Schema Update an existing schema of a data source #### Request PATCH `v1/{accuid}/sources/{source_type}/schema` ```json= { "source_name": "MemberV2", "description": "updated member's information and personal details", "schema":[ { "name": "id", "type": "Integer" }, { "name": "name", "type": "String" }, { "name": "birthdate", "type": "Date" }, { "name": "address", "type": "String" }, { "name": "phone", "type": "String" }, { "name": "email", "type": "String" } ] } ``` #### Response Status Code: `200` ```json= "source_type": "member", "source_name": "MemberV2", "description": "Updated member's information and personal details", "schema":[ { "name": "id", "type": "Integer" }, { "name": "name", "type": "String" }, { "name": "birthdate", "type": "Date" }, { "name": "address", "type": "String" }, { "name": "phone", "type": "String" }, { "name": "email", "type": "String" } ], "updated_at": 1613563094, "updated_by": 2, "created_at": 1593563094, "created_by": 0 ``` ### Data Type Reference https://hackmd.io/dU2Wp2TcTiKYxD0eJ4XB3g?view ## Job Inspection User may check the job status through API either the job is success or failed status with error message provided. ### Query Job Status #### Request GET `v1/{accuid}/sources/{source_type}/job?date={YYYYMMDD}` #### Response Status Code: `200` Job Success ```json= "source_type": "order", "source_name": "Order", "file_name": "20200125_order.csv", "job_id": "78181ab8f4b34a5b80c09da1dc81d76d", "process_time": 1621563094, "status": "success" ``` Job Failed ```json= "source_type": "order", "source_name": "Order", "file_name": "20200125_order.csv", "job_id": "78181ab8f4b34a5b80c09da1dc81d76d", "process_time": 1621563094, "status": "failed", "error_msg": "Invalid data type in column [cust_phone_no]" ``` ### Mail Notification Send mail to the user that notify the job has failed