--- title: Work summary tag: UoM, RSE, ArcGisData date: 01-Nov-2022 --- # Update see latest `.gdb` [here][latestGdb] --- # Work Summary Hi Vasileios Vlastaras, Please see the [`2022-11-01-ClimateJust_Chris.gdb`][01NovGdb] ## Automated optimization processes The current optimization is automating the following operations: - convert `double` to `float` (buggy, see [Noteable issues - `double` to `float` non-sense decimal](#double-to-float-decimal-problem)) - convert `long` to `short` - reduce `text` field `length` to the field's max string length next 10 multiple - `null` value check, finds `null`, it throws assertion error. - unique value check (for code field) ## Notable issues ### `double` to `float` decimal problem The `double` to `float` opteration successfully turn `double` fileds to `float` field. However, the converted float field values appear more decimal digital than the original double field. This problem has not been solved, see problem detail in [esri community](https://community.esri.com/t5/python-questions/covert-double-to-float/m-p/1226016#M65949) ### *Nearly `int`* `double` fields Some `double` fields are nearly `short` but they are decimally deviated, see below `ClimateJust_Chris.gdb\EPSG_27700_CJ_18\England_SHVI\AT2_33_f32`. Potentially, `short` could be applied if we fixed those individual deviated cases but I am not sure if you want to take care of it. ![](https://i.imgur.com/1kQYyJY.png) ### Used Domain instead of explicit Lookup table I create domain for code field instead of building a explicit lookup table like previously, see [`2022-10-28-ClimateJust_Chris.gdb`][28OctGdb]. It is because building an explicit lookup table, in my opinion, is against the database optimization purpose but if you have other considerations, please let me know, lookup table option is avaialble[^lookupTable]. Note that **one** domain only allows exactly **one** set of key-value pair e.g. `{k1: v1, k2: v2}`. i.e. cannot have `{k1: [v11, v12, v13], k2: [v21, v22, v23]}`, which is different from an explicit lookup table, I am not sure which one is preferred. You may find all created domains in [`2022-11-01-ClimateJust_Chris.gdb`][01NovGdb] see below[^code_i32]: ![](https://i.imgur.com/pNvaTJW.png) ## Potential improvements ### Further automation There has been an implemented automation for checking dupliated field for dropping but the current algorithm is naive and slow, see `staticmethod` `ClimateJustUtil.drop_duplicated_fields` in `ClimateJustToolset.py`. Another potential automation could be checking if a field is categorical. If so, create another `int` field which serves as its code, then create and set it a domain. e.g. field `country` in `ClimateJust_Chris.gdb\EPSG_27700_CJ_18\UK_New_CJ_Flood_Data_JOIN`. ### Better domain name format Currently `domain_name_format = f"{code_field}_Domain"`. It is not ideal because there could be the same `code_field` name in different `FeatureDataset`/`FeatureClass`. Ideally, it should be `domain_name_format = f"{feat_ds_name}_{feat_cls_name}_{field_name}_Domain"` but it is not an issue currently. ## TODO - [x] lookup table is prefered, undo Domain approach. - [x] tool box for report field nullability - When no null value is confirmed: - [x] turn nullability to `No` - field_is_nullable = 'NON_NULLABLE' - https://support.esri.com/en/bugs/nimbus/QlVHLTAwMDA5NDIyMw== - [x] set field default value to sth other than `Null` - https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/assign-default-to-field.htm --- [latestGdb]: https://livemanchesterac.sharepoint.com/:u:/s/UoM-NERC-DSA/EQ9COPiKDe9Iho_evPmXkpcBVdZ7eOIn8zyvo4ocFFy5gg?e=vdLQic [01NovGdb]: https://livemanchesterac.sharepoint.com/:u:/s/UoM-NERC-DSA/EbyBcaFpoahIn7HAZ2RaOWIBp3mYjHyr41Rg6Qb-UBRulQ?e=GBDqRB [28OctGdb]: https://livemanchesterac.sharepoint.com/:u:/s/UoM-NERC-DSA/Ec92jkqYKthFs5vpOUQl0LYBF2laGikd1KesGaugL8liaw?e=9YE3R7 [^lookupTable]: Lookup table already has an implementation. It can be an immediate option if it is preferred. [^code_i32]: `code_i32` is `LSOA11CD`, it has been corrected in the [latest `.gdb`][latestGdb]