--- tags: DW --- # Embedded rules ## Ingestion of the JSON Compliance reports The compliance reports are aggregated in the same directory and parsed. If there are 2 versions of these reports (since the last DW workflow), these will be processed together. The only option is to have these 2 versions in two different folders. The option of checking for the a presence of a new column will not work ``` if report.has_column(rules.families): process_new(report) else: process_old(report) ``` Other option is to have the new version of the compliance reports (note that we are not removing fields, so it's back compatible with the DW) in prod at least 24 hours before passing the DW to production. So when the new code is in prod, it only encounters non processed reports with the new embedded families. ## Modifications Assuming the new compliance reports have the format: ```json= { "environment_id": "b584a224-fe7e-44d2-ad8c-0270e5b5546a", "scan_id": "944a624c-a1e6-4818-9830-8562a107a3bf", "rules": { "azurerm/app_service/web_app_auth.rego": { "enabled": true, "controls": [ "CIS-Azure_v1.1.0_9.1" ], "families": { "CIS-Azure_v1.1.0": { "controls": [ "CIS-Azure_v1.1.0_9.1" ] }, "UDF 1": { "direct": true } }, "severity": "Medium", "rule_id": "FG_R00345", "custom": false }, (...) }, "resources": [ (....) ], "controls": { "CIS-Azure_v1.1.0_1.23": { "standard": "CIS-Azure_v1.1.0", "enabled": true }, (...) "custom/Custom rule 3": { "standard": "Custom", "enabled": true } }, "rule_waivers": [] } ``` ### Glue tables - Create a new entry (e.g. `GlueJSONComplianceReportsV4V20210616`) for the JSON schema of the Compliance reports in `cloudformation/glue-cloud-data-warehouse.yaml`: - Modify the `rules` entry with the new embedded info: ```yaml - Name: rules Type: "map<string,struct<result:string,enabled:boolean,missing_resources:array<string>,\ controls:array<string>,severity:string,rule_id:string,custom:boolean,\ families:map<string,struct<controls:array<string>,direct:boolean>>>>" ``` ### Glue jobs The current job responsible to parse the compliance reports is: `glue_scripts/redshift/flatten_compliance_report_v4.py` This job calls: `spark_jobs/relationalize_compliance_report_v4/relationalize_compliance_report_v4.py` and produces the parquet files `rule_results` and `rule_control_mappings` The code responsible for flattening the JSON and building the `rule_control_mappings` is the following: ```python= controls = flatten_controls(compliance_reports) rules = flatten_rules(compliance_reports).cache() control_rule_map = build_control_rule_map(rules, controls) ``` In the next version, we do want to keep the flattening of the controls - we want to keep the situation where we have controls without rules! When building the control map, the flattened rules are not needed any longer. All the necessary info is in the embedded families. ```python= embedded_families = flatten_embedded_families(df) controls = flatten_controls(df) control_embedded = build_control_embedded(embedded_families, controls) ``` The `build_control_embedded` function joins embedded families and controls on control name and builds the `rule_control_mappings` table. Result: ``` +--------------------------------+------------------------------------+------------------------------------+----------------+----------------------+------------------------------------+---------+----------+----+ |id |environment_id |scan_id |family |control_name |rule_id |tenant_id|created_at|dt | +--------------------------------+------------------------------------+------------------------------------+----------------+----------------------+------------------------------------+---------+----------+----+ |6798d9e3611936e8ad1d6ff1502e3a82|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.4 | |null |null |null| |e15777d6284835c7b142acaeee938351|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_8.4 | |null |null |null| |f5b165f3d5d63c7e921e539aea2a11fc|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.13 | |null |null |null| |e2cc2a60dd943ef9b1c662c0da6bd8f5|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.1.1| |null |null |null| |7d258e17617438d1b282b6ef4d1d4ea7|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.14 | |null |null |null| |5eec0a393d51376aab07ee034b343ee8|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_6.2 | |null |null |null| |51a9aae6a275347291e90f3e9e774b3b|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|Custom |custom/Custom rule 2 |6578e2e6-acec-4a81-9ab7-02a9d9102968|null |null |null| |c11ce5effc35373e82ddf5796909aef0|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.3 | |null |null |null| |000332ee7c183138a91a4842027a5e1b|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_1.23 | |null |null |null| |ed5381178dfc35d4a81a63a5c10fb167|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.8| |null |null |null| |ff4b946007ab38f79f504ab34cab720b|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.1 | |null |null |null| |08122e73257b3d2eba3aa8f6fa2fd362|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.2| |null |null |null| |d9e4a30dbc3d3d9280a064478fae50f8|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.6 | |null |null |null| |07230927fcb036dda93d567647b0170b|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.12 | |null |null |null| |c58ed67a00293790b6e1c36a0ffe23b7|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.9| |null |null |null| |7a2d106ed9113054bd852b5415a4f306|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.10 | |null |null |null| |637aa1ffd11d361793675f77345a2cb6|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.15 | |null |null |null| |13bfb00e9aaa393e8a83d3ce0553717f|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_9.1 |FG_R00345 |null |null |null| |2e59ee424c853a95a0eed2b814fdc0b6|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.6| |null |null |null| |e37c2d8101413dfeb7002272c0ebea3b|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.7 | |null |null |null| |64f8163539bf3368a26d43f51dac9bb5|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.16 | |null |null |null| |c117477cb2ce394b958f66581e9b2829|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.9 | |null |null |null| |26169a59d6ed34829b130f9ab8eaa9c2|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.1.7| |null |null |null| |42e887716b5233e99618d4857c4273eb|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|Custom |custom/Custom rule 3 |367477b5-f4f0-4837-bb9c-192f2764cc2f|null |null |null| |f5f3c51c106834c2954c31c37deb5721|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|UDF 2 | |6578e2e6-acec-4a81-9ab7-02a9d9102968|null |null |null| |cbe95dcbe21335e1838e8bf914964cc3|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_6.5 | |null |null |null| |37538acf363331d287ef7ffd6cd8e837|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.15 | |null |null |null| |fe4026f8152f38f6a4f082626cc2ee16|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_9.4 | |null |null |null| |6a0184fda11230f299d12905853bc96e|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_9.3 |FG_R00346 |null |null |null| |0b585fd634fa3975a6d29895ff9ede6e|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|UDF 2 | |FG_R00346 |null |null |null| |fbd065f4950739d1937daddc7f679aa4|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.5| |null |null |null| |3a1a99f1fb653bcc9a9d04ed32766201|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.11 | |null |null |null| |f8bc5cf2de493fa882441238ed6d73a6|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.8 | |null |null |null| |2cfc245a583d3e47877c3c9e8330d665|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.4| |null |null |null| |930ad83673193e42baf18d7934a38b09|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_3.1 | |null |null |null| |feb05acaa18a3ab496497828a671fae7|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.3 | |null |null |null| |7602aa8b4be839c3b8c2142c6e23abe0|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.16 | |null |null |null| |38a2faf15bc83642bbc427233ccbe0f5|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.13 | |null |null |null| |6ff0f1743b3232e58370b29e120d4f6b|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.1 | |null |null |null| |f04d8686f2ba3a2d999f43b2f89a84ff|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_6.1 | |null |null |null| |e8a47341bcf63a4582bc0b33ded7e3b2|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.14 | |null |null |null| |7e60a5b62b0334a5ab498b44ad10a451|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_9.2 |FG_R00346 |null |null |null| |553fc36f13843b15902f585a5c7e8518|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_7.3 | |null |null |null| |c4f3ff3fdd203415a05dc4235e6801f5|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_6.3 | |null |null |null| |97ce1641d2f83940ab02647e5279b495|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.5 | |null |null |null| |d47e2fddab273739a6f60acfc886794d|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|UDF 3 | |367477b5-f4f0-4837-bb9c-192f2764cc2f|null |null |null| |a19d362fabcd37a990b7bead84e5d69a|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.1.2| |null |null |null| |f7f95a68d07f3e71b6665fa7becfb595|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_2.11 | |null |null |null| |795b19aac64a3909ade4e8f22b2d4746|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.12 | |null |null |null| |8ed141d614e13e1f81796dd0e05e5dd6|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.7| |null |null |null| |7f72c6e8e3f03a7cad15e5bea53b9448|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.18 | |null |null |null| |1a6b347b3e4a3c269520bd4ecdf0606a|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.3| |null |null |null| |bc8d1a7b98f53f7d96a569e9efe34d9e|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_8.5 | |null |null |null| |e3769705f6713c169671e7f9f84df3b0|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.1.4| |null |null |null| |1480906c71db3814bd1ce63fc496b5d0|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_7.2 | |null |null |null| |a1de05159de63cffa96a87f7d26f9b16|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_6.4 | |null |null |null| |dccf25c103403bbcb3336852744b4ef7|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.2.1| |null |null |null| |a581852c10c13085b9b303482ea23325|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_5.1.3| |null |null |null| |25ed1affa66b347db36dd0a68c66d925|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|CIS-Azure_v1.1.0|CIS-Azure_v1.1.0_4.17 | |null |null |null| |a4263cca38a438c0b1d3e68aa7b074e9|b584a224-fe7e-44d2-ad8c-0270e5b5546a|944a624c-a1e6-4818-9830-8562a107a3bf|UDF 1 | |FG_R00345 |null |null |null| +--------------------------------+------------------------------------+------------------------------------+----------------+----------------------+------------------------------------+---------+----------+----+ ``` ### Code Branch: `experiment/compliance-report-v5` See: https://github.com/LuminalHQ/fugue-data-warehouse/blob/experiment/compliance-report-v5/tests/flatten_compliance_report_v5.ipynb for the code that generates the control-rule mapping. ### Notes ### If there's time.. Extra column in all main tables that would help to delete duplicated data: ```sql= created_at datetime default sysdate; ```