CRUD Consistency

Summary https://drive.google.com/file/d/1Go2g7Y0clpSYUUpQKUHTMT4zEqeIiS3m/view # Problem When a resource is deleted, any references to that resource may become "broken," thereby compromising referential integrity. This problem is similar to foreign keys for databases - [referencial integrity](https://en.wikipedia.org/wiki/Referential_integrity). [Quora: Why FK is bad](https://www.quora.com/Why-are-foreign-keys-bad#:~:text=It%20adds%20additional%20overhead%20to,if%20anything%20points%20to%20it.) In general, FHIR resources are removed much less frequently than they are created or updated. This is because deletions typically occur only due to erroneous entries. The resources that are usually deleted tend to be leaf resources, which are not extensively referenced by other resources. A lot of resources like Observation or Medication are append only. In most cases, it is more beneficial to use the status element to mark resources as disabled instead of deleting them. This approach helps maintain the integrity of the dataset and avoids the complications associated with broken references. It is two side coin. Sometimes you want to relax ref. integrity and sometimes you don't. In application logic you do not want to store inconsistent data and you want ref. checks. For some complex integrations without ref. integrity you can insert, update or delete data in any order. There is a price of ref. integrity, which you may not want to pay - for example in bulk import. Essentially there are two polуs of ref.integrity and something in between: * no ref. checks * strict ref. checks (full consistency) * soft ref. checks - user can manage when to make checks Honestly, the choice is between **strict** and **flexible**. The good model does not have or minimize anomialies. # Scope of Ref. Checks All operations that do change data are related to reference checks * Create * Update * Patch * Delete * Bulk Insert/Upsert * Data Migrations # Current Sate Aidbox checks references on create/update/patch (POST/PUT/PATCH HTTP methods respectively). But it does not check anything on delete (DELETE method). This leads to a few issues. ### Broken updates after delete Consider the following scenario: 1. Create a Patient resource 2. Create an Observation resource referring to the Patient resource 3. Delete a Patient resource 4. Update the Observation resource Aidbox will not allow updating the Observation resource since the reference to the Patient resource is now broken. # Competitors Summary: SMILE has ref. checks based on search params, Google has strict ref.checks + flag for extensions, others do not have a explicite ref. integrity model. ## SMILE https://smilecdr.com/docs/fhir_repository/creating_data.html * user configure separately create ref chekcs and delete checks * Enforce ref integrity on write and delete, by default true. You can turn off delete but not create/update. * Works for indexed referceces - i.e. by search params * No validation in Bulk Import. * Check extensions??? ### To read: * https://hapifhir.io/hapi-fhir/docs/server_jpa/configuration.html#referential-integrity * zulip thread with direct explanation from James - https://chat.fhir.org/#narrow/stream/179167-hapi/topic/No.20referential.20integrity.20on.20extensions.3F ![image](https://hackmd.io/_uploads/rJ4CbeD80.png) ## Google They claim that they have **absolute integrity** check enabled by default (**disableReferentialIntegrity**), but it is immutable configuration - i.e. defined at storage creation time. Extension refs are disabled by default, should be turned on with **complexDataTypeReferenceParsing** * Discussion - https://chat.fhir.org/#narrow/stream/179166-implementers/topic/referential.20consistency.20on.20FHIR.20servers * https://cloud.google.com/healthcare-api/docs/concepts/fhir-referential-integrity ## firely Firely does not check on create nor on delete. ## medplum Lack of information ## kodjin Tested manually from js console on demo.kodjin.com I was able to delete patient with observation. But in documentation they do have a phrase, that they do check. ![image](https://hackmd.io/_uploads/HJ5MhxPIA.png) ## MS server Does not check ref integrity... # Solutions First we have to choose between strict ref. integrity and soft! [Decision Matrix](https://docs.google.com/spreadsheets/d/1s02_NGsvF3KHWUnYGEiIvgOMuTry_bVeg5ZZSGPw-b8/edit#gid=0) We explore few approaches, which can be categorized into: - Soft deletes; - Checks on delete; - Normalization. These approaches allow to reach increasing levels of consistency, i.e. soft delete provides past consistency, checks provide semiconsistency, normalization provides full consistency. ## Soft delete Allow to create a reference to a resource if it has ever existed (checked via history). This way we make reference check logic a bit more complex (since we need to check both tables now), but otherwise it requires no changes. This is a way to reach past consistency and fix the broken updates issue. ## Check on delete This is a category of approaches enabling semiconsistency. They add no overhead on updates, but instead make deletes more complex. All of these approaches allow to specify which checks to use manually. And it is possible to automatically generate these checks potentially raising the level of consistency. In each description of any of the appoaches we will first describe the manual way and then how to automate it. ### Paths We can record path containing references of interest somewhere (in configuration, in database, etc). Then on delete inspect these paths. Algorithm sketch: Consider the following reference checks configuration: ```yaml - source: Observation target: Patient path: subject - source: Condition target: Patient path: subject - source: Patient target: Organization path: managingOrganization ``` Create some resources: ``` id: org-1 resourceType: Organization --- id: pt-1 resourceType: Patient managingOrganization: reference: Organization/org-1 --- id: obs-1 resourceType: Organization subject: reference: Patient/pt-1 ``` Next we will try to delete the `Patient/pt-1` resource. We need to check if there are any registered references pointing to the `Patient/pt-1` resource. Find all reference checks of interest ```yaml - source: Observation target: Patient path: subject - source: Condition target: Patient path: subject ``` Do the reference check ```sql SELECT EXISTS ( SELECT * FROM Observation WHERE resource @> jsonb_build_object( 'subject', jsonb_build_object( 'id', 'pt-1', 'resourceType', 'Patient' ) ) ) OR EXISTS ( SELECT * FROM Condition WHERE resource @> jsonb_build_object( 'subject', jsonb_build_object( 'id', 'pt-1', 'resourceType', 'Patient' ) ) ) ``` If there are references found, raise an error. #### Automation To automate we can automatically register known reference paths on update, and store it in the database. This approach adds amortized \\( o(1) \\) cost on updates and reaches full consistency. However it leads to a lot of checks depending on GIN index on all resources. Which possible leads to slower updates and much slower deletes. ### SearchParameters This approach is a modification of the previous one. FHIR provides reference search parameter which can link other resources. We can use them as a source of information about useful references. This replaces manual specification of paths with manual specification of search parameters. Aidbox deduces paths from the search parameter definitions. Otherwise it is the same as the paths approach. We do not need paths ... #### Algorythm: On resource delete find all search params with target looking at this resource and run one by one. #### Automation Use all search parameters available in Aidbox. This approach still leads to semiconsistency. #### Impl via SearchParameters: ```clojure (require '[far.client.api :as far-api]) (require '[aidbox.context :as c]) (require '[proto.search.core :as srch]) (def searches (far-api/retrieve-search-parameters (c/get-ar-client (user/devbox-ctx)))) (def pat-searches (filterv #(some #{"Patient"} (get % "target")) searches)) (defn can-delete? [pt-id] (let [result (reduce (fn [_ pt-srch] (let [bases (get pt-srch "base")] (let [srch-per-base (reduce (fn [_ base] (let [srch-result (srch/search (:db/connection (user/devbox-ctx)) (user/devbox-ctx) base (str (get pt-srch "code") "=" pt-id))] (when (> (:total srch-result) 0) (reduced {:error srch-result})))) bases)] (when (get srch-per-base :error) (reduced srch-per-base))))) pat-searches)] (when (:error result) (format "Can't delete due to: %s" (mapv (fn [entry] (str (get-in entry [:resource :resourceType]) "/" (get-in entry [:resource :id]))) (get-in result [:error :entry])))))) ``` #### Notes Does searching in recursive structures work? - Patient.extension.extension.valueReference - Questionnaire.item.item.initial.valueReference ## Normalization All the normalization approaches here provide full consistency. But they differ in the number of intermediate tables and implementation complexity. ### \\( N^2 \\) tables Generate a table per source-rt, target-rt pair, e.g. ```sql CREATE TABLE refs.observation_patient ( source_id text NOT NULL REFERENCES public.observation(id) ON DELETE CASCADE, target_id text NOT NULL REFERENCES public.patient(id) ON DELETE RESTRICT ) ``` On insert Aidbox inserts corresponding rows into these relation tables. These tables are created lazily, since in reality we expect much less than n^2 relationships. ### \\( 2N + 1 \\) tables Generate reference table: ``` CREATE TABLE refs.references ( id uuid NOT NULL PRIMARY KEY ) ``` Generate two tables per resource: ```sql CREATE TABLE refs.out_patient ( resource_id text NOT NULL REFERENCES public.patient(id) ON DELETE CASCADE, ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE DO NOTHING ); CREATE TABLE refs.in_patient ( resource_id text NOT NULL REFERENCES public.patient(id) ON DELETE RESTRICT, ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE ); CREATE TABLE refs.out_observation ( resource_id text NOT NULL REFERENCES public.observation(id) ON DELETE CASCADE, ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE DO NOTHING ) CREATE TABLE refs.in_observation ( resource_id text NOT NULL REFERENCES public.observation(id) ON DELETE RESTRICT, ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE ) ``` On insert add one row into `references` table, and corresponding links. E.g. for the following resource ```json { "resourceType": "Patient", "id": "pt-1", "managingOrganization": { "reference": "Organization/org-1" } } ``` Do the following queries: ```sql INSERT INTO refs.references (id) VALUES (some-uuid); INSERT INTO refs.out_patient (resource_id, ref_id) VALUES ('pt-1', some-uuid, TRUE) INSERT INTO refs.in_observation (resource_id, ref_id) VALUES ('obs-1', some-uuid, FALSE) ``` To delete a resource, do ```sql DELETE FROM refs.references USING refs.out_patient WHERE refs.references.id = refs.out_patient.ref_id AND refs.out_patient.id = 'pt-1'; DELETE FROM patient WHERE patient.id = 'pt-1'; ``` ## \\( N+1 \\) Tables Generate reference table: ``` CREATE TABLE refs.references ( id uuid NOT NULL PRIMARY KEY ) ``` Generate two tables per resource: ```sql CREATE TABLE refs.ref_patient ( resource_id text NOT NULL REFERENCES public.patient(id) ON DELETE RESTRICT, ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE, is_out boolean NOT NULL ); CREATE TABLE refs.ref_observation ( resource_id text NOT NULL REFERENCES public.observation(id) ON DELETE RESTRICT, ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE, is_out boolean NOT NULL ) ``` On insert add one row into `references` table, and corresponding links. E.g. for the following resource ```json { "resourceType": "Patient", "id": "pt-1", "managingOrganization": { "reference": "Organization/org-1" } } ``` Do the following queries: ```sql INSERT INTO refs.references (id) VALUES (some-uuid); INSERT INTO refs.out_patient (resource_id, ref_id) VALUES ('pt-1', some-uuid, TRUE) INSERT INTO refs.in_observation (resource_id, ref_id) VALUES ('obs-1', some-uuid, FALSE) ``` To delete a resource, do ```sql DELETE FROM refs.references USING refs.out_patient WHERE refs.references.id = refs.ref_patient.ref_id AND refs.ref_patient.id = 'pt-1'; AND refs.out_patient.out = 'true'; DELETE FROM patient WHERE patient.id = 'pt-1'; ``` ## Magic update explicite param ``` PUT /Observation/obs-id?checkOnlyUpdatedReferences ``` If this param is passed and in case of borken ref, server will check that ref is broken not by client and pass the validation. Database already has an anomalie, and this anomalie influence update, which will fail. Logging bad situation. TODO: discuss it! # Circular references problem ![image](https://hackmd.io/_uploads/Hy7JqAzvC.png) ## To Read * Graham talk about ref. integrity: - https://www.devdays.com/wp-content/uploads/2021/12/DD21US_20210607_Grahame_Grieve_Referential_Integrity.pdf - https://youtu.be/3F-vj_rR6c4?si=7992vlA5oFE-VG1o ## Use cases * GDPR - hard delete