Summary https://drive.google.com/file/d/1Go2g7Y0clpSYUUpQKUHTMT4zEqeIiS3m/view
# Problem
When a resource is deleted, any references to that resource may become "broken," thereby compromising referential integrity.
This problem is similar to foreign keys for databases - [referencial integrity](https://en.wikipedia.org/wiki/Referential_integrity).
[Quora: Why FK is bad](https://www.quora.com/Why-are-foreign-keys-bad#:~:text=It%20adds%20additional%20overhead%20to,if%20anything%20points%20to%20it.)
In general, FHIR resources are removed much less frequently than they are created or updated. This is because deletions typically occur only due to erroneous entries. The resources that are usually deleted tend to be leaf resources, which are not extensively referenced by other resources. A lot of resources like Observation or Medication are append only.
In most cases, it is more beneficial to use the status element to mark resources as disabled instead of deleting them. This approach helps maintain the integrity of the dataset and avoids the complications associated with broken references.
It is two side coin. Sometimes you want to relax ref. integrity and sometimes you don't. In application logic you do not want to store inconsistent data and you want ref. checks. For some complex integrations without ref. integrity you can insert, update or delete data in any order.
There is a price of ref. integrity, which you may not want to pay - for example in bulk import.
Essentially there are two polуs of ref.integrity and something in between:
* no ref. checks
* strict ref. checks (full consistency)
* soft ref. checks - user can manage when to make checks
Honestly, the choice is between **strict** and **flexible**.
The good model does not have or minimize anomialies.
# Scope of Ref. Checks
All operations that do change data are related to reference checks
* Create
* Update
* Patch
* Delete
* Bulk Insert/Upsert
* Data Migrations
# Current Sate
Aidbox checks references on create/update/patch (POST/PUT/PATCH HTTP methods
respectively). But it does not check anything on delete (DELETE method).
This leads to a few issues.
### Broken updates after delete
Consider the following scenario:
1. Create a Patient resource
2. Create an Observation resource referring to the Patient resource
3. Delete a Patient resource
4. Update the Observation resource
Aidbox will not allow updating the Observation resource since the reference
to the Patient resource is now broken.
# Competitors
Summary: SMILE has ref. checks based on search params, Google has strict ref.checks + flag for extensions, others do not have a explicite ref. integrity model.
## SMILE
https://smilecdr.com/docs/fhir_repository/creating_data.html
* user configure separately create ref chekcs and delete checks
* Enforce ref integrity on write and delete, by default true. You can turn off delete but not create/update.
* Works for indexed referceces - i.e. by search params
* No validation in Bulk Import.
* Check extensions???
### To read:
* https://hapifhir.io/hapi-fhir/docs/server_jpa/configuration.html#referential-integrity
* zulip thread with direct explanation from James - https://chat.fhir.org/#narrow/stream/179167-hapi/topic/No.20referential.20integrity.20on.20extensions.3F

## Google
They claim that they have **absolute integrity** check enabled by default (**disableReferentialIntegrity**), but it is immutable configuration - i.e. defined at storage creation time. Extension refs are disabled by default, should be turned on with **complexDataTypeReferenceParsing**
* Discussion - https://chat.fhir.org/#narrow/stream/179166-implementers/topic/referential.20consistency.20on.20FHIR.20servers
* https://cloud.google.com/healthcare-api/docs/concepts/fhir-referential-integrity
## firely
Firely does not check on create nor on delete.
## medplum
Lack of information
## kodjin
Tested manually from js console on demo.kodjin.com
I was able to delete patient with observation.
But in documentation they do have a phrase, that they do check.

## MS server
Does not check ref integrity...
# Solutions
First we have to choose between strict ref. integrity and soft!
[Decision Matrix](https://docs.google.com/spreadsheets/d/1s02_NGsvF3KHWUnYGEiIvgOMuTry_bVeg5ZZSGPw-b8/edit#gid=0)
We explore few approaches, which can be categorized into:
- Soft deletes;
- Checks on delete;
- Normalization.
These approaches allow to reach increasing levels of consistency, i.e.
soft delete provides past consistency, checks provide semiconsistency,
normalization provides full consistency.
## Soft delete
Allow to create a reference to a resource if it has ever existed (checked via history).
This way we make reference check logic a bit more complex (since we need to check both tables now),
but otherwise it requires no changes.
This is a way to reach past consistency and fix the broken updates issue.
## Check on delete
This is a category of approaches enabling semiconsistency. They add no overhead on updates,
but instead make deletes more complex.
All of these approaches allow to specify which checks to use manually.
And it is possible to automatically generate these checks potentially
raising the level of consistency.
In each description of any of the appoaches we will first describe the manual way
and then how to automate it.
### Paths
We can record path containing references of interest somewhere (in configuration,
in database, etc). Then on delete inspect these paths.
Algorithm sketch:
Consider the following reference checks configuration:
```yaml
- source: Observation
target: Patient
path: subject
- source: Condition
target: Patient
path: subject
- source: Patient
target: Organization
path: managingOrganization
```
Create some resources:
```
id: org-1
resourceType: Organization
---
id: pt-1
resourceType: Patient
managingOrganization:
reference: Organization/org-1
---
id: obs-1
resourceType: Organization
subject:
reference: Patient/pt-1
```
Next we will try to delete the `Patient/pt-1` resource.
We need to check if there are any registered references
pointing to the `Patient/pt-1` resource.
Find all reference checks of interest
```yaml
- source: Observation
target: Patient
path: subject
- source: Condition
target: Patient
path: subject
```
Do the reference check
```sql
SELECT
EXISTS (
SELECT * FROM Observation
WHERE resource @> jsonb_build_object(
'subject', jsonb_build_object(
'id', 'pt-1',
'resourceType', 'Patient'
)
)
)
OR EXISTS (
SELECT * FROM Condition
WHERE resource @> jsonb_build_object(
'subject', jsonb_build_object(
'id', 'pt-1',
'resourceType', 'Patient'
)
)
)
```
If there are references found, raise an error.
#### Automation
To automate we can automatically register known reference paths on update,
and store it in the database.
This approach adds amortized \\( o(1) \\) cost on updates and reaches full
consistency. However it leads to a lot of checks depending on GIN index
on all resources. Which possible leads to slower updates and much slower
deletes.
### SearchParameters
This approach is a modification of the previous one.
FHIR provides reference search parameter which can link other resources.
We can use them as a source of information about useful references.
This replaces manual specification of paths with manual specification
of search parameters.
Aidbox deduces paths from the search parameter definitions.
Otherwise it is the same as the paths approach.
We do not need paths ...
#### Algorythm:
On resource delete find all search params with target looking at this resource and run one by one.
#### Automation
Use all search parameters available in Aidbox.
This approach still leads to semiconsistency.
#### Impl via SearchParameters:
```clojure
(require '[far.client.api :as far-api])
(require '[aidbox.context :as c])
(require '[proto.search.core :as srch])
(def searches
(far-api/retrieve-search-parameters
(c/get-ar-client (user/devbox-ctx))))
(def pat-searches
(filterv #(some #{"Patient"} (get % "target")) searches))
(defn can-delete? [pt-id]
(let [result
(reduce
(fn [_ pt-srch]
(let [bases (get pt-srch "base")]
(let [srch-per-base
(reduce (fn [_ base]
(let [srch-result (srch/search
(:db/connection (user/devbox-ctx))
(user/devbox-ctx)
base
(str (get pt-srch "code")
"="
pt-id))]
(when (> (:total srch-result) 0)
(reduced {:error srch-result}))))
bases)]
(when (get srch-per-base :error)
(reduced srch-per-base)))))
pat-searches)]
(when (:error result)
(format "Can't delete due to: %s"
(mapv (fn [entry]
(str (get-in entry [:resource :resourceType])
"/"
(get-in entry [:resource :id])))
(get-in result [:error :entry]))))))
```
#### Notes
Does searching in recursive structures work?
- Patient.extension.extension.valueReference
- Questionnaire.item.item.initial.valueReference
## Normalization
All the normalization approaches here provide full consistency.
But they differ in the number of intermediate tables and implementation
complexity.
### \\( N^2 \\) tables
Generate a table per source-rt, target-rt pair, e.g.
```sql
CREATE TABLE refs.observation_patient (
source_id text NOT NULL REFERENCES public.observation(id) ON DELETE CASCADE,
target_id text NOT NULL REFERENCES public.patient(id) ON DELETE RESTRICT
)
```
On insert Aidbox inserts corresponding rows into these relation tables.
These tables are created lazily, since in reality we expect much less than n^2 relationships.
### \\( 2N + 1 \\) tables
Generate reference table:
```
CREATE TABLE refs.references (
id uuid NOT NULL PRIMARY KEY
)
```
Generate two tables per resource:
```sql
CREATE TABLE refs.out_patient (
resource_id text NOT NULL REFERENCES public.patient(id) ON DELETE CASCADE,
ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE DO NOTHING
);
CREATE TABLE refs.in_patient (
resource_id text NOT NULL REFERENCES public.patient(id) ON DELETE RESTRICT,
ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE
);
CREATE TABLE refs.out_observation (
resource_id text NOT NULL REFERENCES public.observation(id) ON DELETE CASCADE,
ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE DO NOTHING
)
CREATE TABLE refs.in_observation (
resource_id text NOT NULL REFERENCES public.observation(id) ON DELETE RESTRICT,
ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE
)
```
On insert add one row into `references` table, and corresponding links.
E.g. for the following resource
```json
{
"resourceType": "Patient",
"id": "pt-1",
"managingOrganization": {
"reference": "Organization/org-1"
}
}
```
Do the following queries:
```sql
INSERT INTO refs.references (id) VALUES (some-uuid);
INSERT INTO refs.out_patient (resource_id, ref_id)
VALUES ('pt-1', some-uuid, TRUE)
INSERT INTO refs.in_observation (resource_id, ref_id)
VALUES ('obs-1', some-uuid, FALSE)
```
To delete a resource, do
```sql
DELETE FROM refs.references USING refs.out_patient
WHERE refs.references.id = refs.out_patient.ref_id
AND refs.out_patient.id = 'pt-1';
DELETE FROM patient
WHERE patient.id = 'pt-1';
```
## \\( N+1 \\) Tables
Generate reference table:
```
CREATE TABLE refs.references (
id uuid NOT NULL PRIMARY KEY
)
```
Generate two tables per resource:
```sql
CREATE TABLE refs.ref_patient (
resource_id text NOT NULL REFERENCES public.patient(id) ON DELETE RESTRICT,
ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE,
is_out boolean NOT NULL
);
CREATE TABLE refs.ref_observation (
resource_id text NOT NULL REFERENCES public.observation(id) ON DELETE RESTRICT,
ref_id uuid NOT NULL REFERENCES refs.references(id) ON DELETE CASCADE,
is_out boolean NOT NULL
)
```
On insert add one row into `references` table, and corresponding links.
E.g. for the following resource
```json
{
"resourceType": "Patient",
"id": "pt-1",
"managingOrganization": {
"reference": "Organization/org-1"
}
}
```
Do the following queries:
```sql
INSERT INTO refs.references (id) VALUES (some-uuid);
INSERT INTO refs.out_patient (resource_id, ref_id)
VALUES ('pt-1', some-uuid, TRUE)
INSERT INTO refs.in_observation (resource_id, ref_id)
VALUES ('obs-1', some-uuid, FALSE)
```
To delete a resource, do
```sql
DELETE FROM refs.references USING refs.out_patient
WHERE refs.references.id = refs.ref_patient.ref_id
AND refs.ref_patient.id = 'pt-1';
AND refs.out_patient.out = 'true';
DELETE FROM patient
WHERE patient.id = 'pt-1';
```
## Magic update explicite param
```
PUT /Observation/obs-id?checkOnlyUpdatedReferences
```
If this param is passed and in case of borken ref, server
will check that ref is broken not by client and pass the validation.
Database already has an anomalie, and this anomalie influence update, which will fail.
Logging bad situation.
TODO: discuss it!
# Circular references problem

## To Read
* Graham talk about ref. integrity:
- https://www.devdays.com/wp-content/uploads/2021/12/DD21US_20210607_Grahame_Grieve_Referential_Integrity.pdf
- https://youtu.be/3F-vj_rR6c4?si=7992vlA5oFE-VG1o
## Use cases
* GDPR - hard delete